On Feb 16, 6:33 pm, "osmium" <r124c4u...@comcast.netwrote:
"Mensanator" wrote:
On Feb 14, 10:05 am, "osmium" <r124c4u...@comcast.netwrote:
"Surya Dutt" wrote:
Did someone happen to come accross and algorithm or a C++ program that
simulates the Smith Collge Diploma problem?
"At Smith College, the graduation exercises traditionally proceed as
follows: Although
each diploma is made out to a particular girl, all the diplomas are
initially given out at random. All of the girls who do not get >their
own
diplomas then form a circle, and each passes the diploma she has to the
girl on her right. Those who now have >their own diplomas drop out, and
the
remaining girls again pass their diplomas to the right, and so on. This
procedure is >repeated until each girl has her own diploma. If there are
n
girls in the graduating class what is the probability that it takes
>precisely k passes before each girl has her own diploma?"
I don't think a computer simulation provides a useful way of answering
the
question asked.
Really?
Consider the results of running a million simulations for n
= 500 students.
Ok, but I did 1000 so it would fit on my spreadsheet.
Now make a histogram of the resulting k's with the
probability of k on the y axis.
Ok.
For most of its length the graph will be a
line close to zero with a fairly narrow and very high spike around the
expected number of interchanges.
Not if you autoscale the chart.
Now do another million simulations.
Why?
Very little information is added.
Right, that's why the first run is sufficient.
There is no assurance at all that the answers
are even *relatively* right, that is a k value of 100 could show a lower
probability of occurring than a k of 75.
Huh? The mean for N students is N/2 (that's what the
simulation will tell if you didn't already know it),
so k=100 will not have a lower probability than k=75,
that much is obvious.
Then why does your simulation show that p(235) is .6 and p(236) is .1?
Because it's a simulation. You don't use the results
of the simulation to determine the probability, you
use the simulation to determine the distribution.
236
is closer to what I call the "spike", why is it not the larger number?
That's called "luck" in the trade.
Make
a few more million runs and *this particular* glitch will most likely go
away.
At 10000, it didn't go away, but it was less pronounced.
But it
won't solve new, similar glitches for much smaller values of k - such as
100.
And it's not supposed to. You're not seeing the forest
for the trees. The individual trees (data points from
the simulation) don't tell you the probability. But
since collectively they show that the simulation is a
normal distribution (forest), we can determine
mathematically what the probability is for k=100 or
k=75.
To wit:
3.4838E-130 for k=100
2.7294E-176 for k=75
Run your own simulation and prove me wrong.
>
I would expect most of the
probabilities to start out with two or three zeroes after the decimal
point
before encountering a significant digit
Yeah, but what does that have to do with determining the
probability of k passes?
If the question were "What is the expected number of interchanges
required,
a simulation would be fine.
I think you glossed over that sentence, your response is all about the cases
which I explicitly bless in that sentence.
No, it wasn't. It was about using a normal distribution
to determine the probabilities. I guess I took for
granted that you could see how to use this for k=100
and k=75.
I was simply warning the OP that
he could not answer the question as written via a simulation.
IF you only look at the trees.
He *could*,
however, answer a somewhat similar question - which is what you did.
No, I answered the original question
(the NORMDIST(k,Mean,StandardDeviation,FALSE)
column shown below) by looking at the forest.
>
But IMO to get the "probability of precisely k
passes" via simulation is doomed to failure
Really?
Even if we are willing to
settle for any reasonable approximation of precisely, it is still doomed.
My reasonable approximation looked pretty good.
You only provide values for 37 of the 500 possible k's and leave the reader
to assume all the other cases are zero.
I didn't leave the reader to assume that. Again, I took
for granted that the reader would realize that he can
simply plug any value of k into
NORMDIST(k,Mean,StandardDeviation,FALSE) to see what the
probability is.
But in fact, they just happened to
be 0
They came out 0 in the simulation, but we don't
use the simulation to find the probability, we use
the simulation to find the distribution from which
we can find the probability for any k. For example,
here are the probabilities for some k values outside
my simulation results.
k probability
300 0.00000000000010928528115%
301 0.00000000000002984982860%
302 0.00000000000000794360258%
303 0.00000000000000205962746%
304 0.00000000000000052030181%
305 0.00000000000000012806118%
306 0.00000000000000003070967%
307 0.00000000000000000717511%
308 0.00000000000000000163334%
309 0.00000000000000000036226%
310 0.00000000000000000007828%
311 0.00000000000000000001648%
312 0.00000000000000000000338%
313 0.00000000000000000000068%
See, no simulation necessary.
in only 1000 cases you used to simulate 500! cases.
You never took a statistics class, eh?
>
When I see the word "precisely", as in this problem, my ears perk up.
There is no permission to ignore or slough off on certain values of k.
I didn't ignore or slough off those other k values.
I gave you an Excel function, all you have to do is
plug in any value you want (provided you know the
mean and the standard deviation).
Consider
a similar situation. If someone said "Toss a fair coin 500 times.
Precisely how mnany times do you get 27 heads?"
Wrong use of "precisely" here. You _can_ ask prcisely
how many 500-bit numbers have a pop-count of 27:
334810521009929044447947514934325903028914000
But you cannot say precisely which of the
3273390607896141870013189696827599152216642046043
0647894832913680961337964046745548832700923259041
5715088668412756007100921725654588539305332852758
9376
outcomes you'll get.
What would the answer be?
(The probability is left as an excercize for the student.)
I see these possible answers:
o I don't know
o a really small number
o I don't care
o zero
o I'm tired and want to go to sleep
o a number which answers the question.
Is there somne reason you would not answer "zero" as you did in the Smith
College situation?
I never answered 0, I left the values of k not covered
by the simulation undetermined. But undetermined
doesn't mean they are 0 as I showed above.
The only useful right
answer is the last one, computed from knowledge of the binomail distributon.
You can't get the answer from simulation, you won't live long enough.
*That* was my point.
And my point was you aren't supposed to get the answer
from the simulation, that's why I said you're looking
at it he wrong way.
>
The problem is not so inordinately difficult that analyzing it is plain
impossible; as a matter of fact the very first hit I got on Google
provided
an analytic solution. But note that it might be necessary to use a big
number library to come up with numerical answers.
Uh, maybe you're looking at it the wrong way?
It really comes down to, what does the word "precisely" mean, if anything,
in the problem statement. I took its
literal meaning, since I have no reason to believe it was a "noise" word
inserted to confuse me.
What part of NORMDIST(k,Mean,StandardDeviation,FALSE)
don't you understand?
>
Moral: Programmers do not eliminate the need for mathematicians.
If I use the Excel function for Normal Distribution using
Mean and StandardDeviation obtained from the simulation for
N=500, wouldn't you say that this is a "reasonable approximation"
when compared to the actual probabilities from the simulation?
k Actual NORMDIST(k,Mean,StandardDeviation,FALSE)
233 0.10% 0.11%
234 0.20% 0.18%
235 0.60% 0.27%
236 0.10% 0.40%
237 0.60% 0.57%
238 1.10% 0.80%
239 1.00% 1.10%
240 1.40% 1.47%
241 1.70% 1.92%
242 2.80% 2.44%
243 3.40% 3.01%
244 3.40% 3.62%
245 4.70% 4.25%
246 5.30% 4.86%
247 4.20% 5.42%
248 5.90% 5.88%
249 6.00% 6.21%
250 5.50% 6.40%
251 7.80% 6.43%
252 6.10% 6.28%
253 4.60% 5.99%
254 5.50% 5.56%
255 5.30% 5.03%
256 4.90% 4.43%
257 4.80% 3.80%
258 3.50% 3.18%
259 2.50% 2.59%
260 2.00% 2.06%
261 1.10% 1.59%
262 1.40% 1.20%
263 0.70% 0.88%
264 0.70% 0.63%
265 0.30% 0.44%
266 0.20% 0.30%
267 0.20% 0.20%
268 0.10% 0.13%
269 0.20% 0.08%