# Random strings of character and some stats

 P: n/a Hello, I've got two random number/statistics questions I'd like you to review. My first question is not directly related to PHP, but will be implemented in PHP, as explained in my second question, so let's go: I want to generate 10000 strings of x characters, with one chance (or less) on a million that you can guess them by just randomly typing them. So I need to know what is the value of x. I wrote the following equation : 36^x/10000 = 1000000 <=> 36^x = 10000 * 1000000 <=> 36^x = 1010 <=> x = ln(1010)/ln(36) <=> x = 23.025850929940456840179914546844/3.5835189384561100016249547167614 <=> x = 6.4254860446923437997173954827712 So, a 7-characters string would be good enough. So my first question is: is my reasoning OK? Knowing my math abilities, I doubt it very much! ;) The second question I have is related to PHP's rand() function. I've read many times that rand() is not random enough, especially when generating long lists of this kind. Would you use something that's more powerful than rand(), are there stronger random functions, within PEAR for instance, or anything? Thanks, JFLac Jul 17 '05 #1
 P: n/a On 2 May 2005 13:55:18 -0700, jf********@gmail.com (Jean-Fran?ois Lacrampe) wrote: I've got two random number/statistics questions I'd like you toreview. My first question is not directly related to PHP, but will beimplemented in PHP, as explained in my second question, so let's go:I want to generate 10000 strings of x characters, with one chance (orless) on a million that you can guess them by just randomly typingthem. So I need to know what is the value of x. I'm not 100% clear on the "them" in the sentence; are you saying you want less than 1/1000000 chance of guessing ONE of the 10000 strings, or 1/1000000 chance of guessing the ENTIRE SET of 10000 strings? I wrote the following equation : 36^x/10000 = 1000000 Depends on the interpretation above. Not sure I get how the 10000 is involved here though. <=> 36^x = 10000 * 1000000<=> 36^x = 1010 10000 * 1000000 = 1010 ? Is that supposed to be 10^10 ? <=> x = ln(1010)/ln(36)<=> x = 23.025850929940456840179914546844/3.5835189384561100016249547167614 Apparently so :-) <=> x = 6.4254860446923437997173954827712So, a 7-characters string would be good enough. If you want at worst 1/1000000 chance of guessing any string, isn't the number of strings irrelevant if they're random? i.e. it's just 36^x > 1000000 => x > ln(1000000)/ln(36) => x > 3.855 So minimum number of chars = 4. (36^3 = 46656, 36^4 = 1679616) The odds of guessing ALL the strings surely head well out of the 1 in 1000000 range for 10000 strings very quickly... So my first question is: is my reasoning OK? Knowing my mathabilities, I doubt it very much! ;)The second question I have is related to PHP's rand() function. I'veread many times that rand() is not random enough, especially whengenerating long lists of this kind. Would you use something that'smore powerful than rand(), are there stronger random functions, withinPEAR for instance, or anything? mt_rand() uses the Mersenne Twister pseudorandom algorithm, which is typically better (and as a bonus it's faster too). If you want to get really serious you'll need to base it on some sort of truly physical phenomenon, e.g. with RNG hardware, which is often based on random thermal fluctuations. -- Andy Hassall / / Space: disk usage analysis tool Jul 17 '05 #2

 P: n/a Andy Hassall wrote: If you want at worst 1/1000000 chance of guessing any string, isn't the number of strings irrelevant if they're random? No, because each additional string means an additional try. Think of it this way: A person comes with a number of letters randomly and the computer tries 10000 times to guess it. Jul 17 '05 #3

 P: n/a Andy Hassall wrote: I'm not 100% clear on the "them" in the sentence; are you saying you want less than 1/1000000 chance of guessing ONE of the 10000 strings, or 1/1000000 chance of guessing the ENTIRE SET of 10000 strings? I meant the odds of guessing any of the 10000 strings, of course! :-) The odds of guessing the entire set must be really, really low! 10000 * 1000000 = 1010 ? Is that supposed to be 10^10 ? Well, I wrote the equation in another editor who was sooo happy to show me it was able to display the 10^10 graphically. Too bad it forgot to copy/paste it back to me with the circumflex. If you want at worst 1/1000000 chance of guessing any string, isn't the number of strings irrelevant if they're random? Well, keeping in mind my very 'intuitive' and weak knowledge of math, I'd guess that the more strings you put in the list, the more chances you have to guess one (any) of them. If for instance I had a list long enough to contain all the possible combinations, the odds would be 1/1, right? If you divide the list by two, the odds are 1/2. And so on. So the number of items in the list seems to matter: that's how I came with the 10000 * 1000000 thing (by doing lots of intermediate and stupid steps on a sheet of paper). I'm not sure at all that I put the 10000 where I should have in the equation, though, hence my initial question. Now, I'm talking about things I don't understand (math) in a language that isn't my native language and I reckon that I'm a bit awkward at explaining my thoughts. :-) mt_rand() uses the Mersenne Twister pseudorandom algorithm, which is typically better (and as a bonus it's faster too). If you want to get really serious you'll need to base it on some sort of truly physical phenomenon, e.g. with RNG hardware, which is often based on random thermal fluctuations. I could also use a webcam on a lava lamp and produce my results using the webcam info, but I guess I don't need that randomness. I just wanted to know what was my best bet with what PHP can give me, with minimal hassle. ;-) Thanks for your answers, JFLac Jul 17 '05 #4

 P: n/a Jean-Francois Lacrampe wrote: I want to generate 10000 strings of x characters, with one chance (or less) on a million that you can guess them by just randomly typing them. So I need to know what is the value of x. OK, one in a million chance of successfully guessing 10,000 strings equals 0.9986 chance of successfully guessing a single string: 0.9986 ^ 10000 = 8.23412E-07 ~ 1E-06 (one in a million) In other words, even if you are virtually certain to get a single string right, it's still virtually impossible to get 10,000 of them right. So a one-character string will suffice. In fact, even a one-bit value (0 or 1) would be an overkill. :) The second question I have is related to PHP's rand() function. I've read many times that rand() is not random enough, especially when generating long lists of this kind. Would you use something that's more powerful than rand(), are there stronger random functions, within PEAR for instance, or anything? Check out mt_rand(): http://www.php.net/mt_rand Cheers, NC Jul 17 '05 #5

 P: n/a NC wrote: Jean-Francois Lacrampe wrote: I want to generate 10000 strings of x characters, with one chance (or less) on a million that you can guess them by just randomly typing them. So I need to know what is the value of x. OK, one in a million chance of successfully guessing 10,000 strings equals 0.9986 chance of successfully guessing a single string: 0.9986 ^ 10000 = 8.23412E-07 ~ 1E-06 (one in a million) As I said in another branch of the thread, I wasn't clear enough: I meant 'one chance (or less) on a million that you can guess _any_ of them'. Anyway... Here's the code I wrote to generate my 10000 strings, just in case it's useful to somebody browsing the archives, someday. The random function is pretty much the same as the one you see on every php tutorial, but it uses mt_rand() instead of rand() as many of you have advised me. and I wrote a (very inefficient) dupe checker. Optimizations and ideas are welcome, but that's just for the fun of it: I'll generate these strings just once, so it doesn't matter if it takes one full minute, it will only be ran once. :-) '; print_r (\$values); echo ''; ?> JFLac Jul 17 '05 #6

 P: n/a On 3 May 2005 03:44:19 -0700, "Jean-François Lacrampe" wrote: Well, keeping in mind my very 'intuitive' and weak knowledge of math,I'd guess that the more strings you put in the list, the more chancesyou have to guess one (any) of them. If for instance I had a list longenough to contain all the possible combinations, the odds would be 1/1,right? If you divide the list by two, the odds are 1/2. And so on.So the number of items in the list seems to matter: that's how I camewith the 10000 * 1000000 thing (by doing lots of intermediate andstupid steps on a sheet of paper). Ah, yes of course. OK, I agree with your maths, looks right. -- Andy Hassall / / Space: disk usage analysis tool Jul 17 '05 #7

