473,396 Members | 1,891 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,396 software developers and data experts.

Probabilistic Data Structure

Hi All!,

Does anyone know a link to a website or know the best way to create a data
structure that will pick random elements from itself with assigned
probabilities?

For example, say I created a bag of 3 marbles(red, green, and blue) and the
red marble had a 50% change of being picked while the green and blue marbles
both had 25% chance of being picked. they should be picked randomly, but
picked in such a way that the red marble will be picked half the time and
the other 2 a quarter of the time. What is the best way to do this
algorithmically?

Thanks,
David
or*******@yahoo.com


Jul 17 '05 #1
12 3108
David Turner wrote:
Hi All!,

Does anyone know a link to a website or know the best way to create a data
structure that will pick random elements from itself with assigned
probabilities?

For example, say I created a bag of 3 marbles(red, green, and blue) and the
red marble had a 50% change of being picked while the green and blue marbles
both had 25% chance of being picked. they should be picked randomly, but
picked in such a way that the red marble will be picked half the time and
the other 2 a quarter of the time. What is the best way to do this
algorithmically?


The way I would do it, for your example, would be to create an array of
4 elements, with 0 and 1 = 'red', 2 = 'green' and 3 = 'blue', then
randomly select elements 0 to 3 evenly.

SteveE

Jul 17 '05 #2
David Turner wrote:
Hi All!,

Does anyone know a link to a website or know the best way to create a data
structure that will pick random elements from itself with assigned
probabilities?

For example, say I created a bag of 3 marbles(red, green, and blue) and the
red marble had a 50% change of being picked while the green and blue marbles
both had 25% chance of being picked. they should be picked randomly, but
picked in such a way that the red marble will be picked half the time and
the other 2 a quarter of the time. What is the best way to do this
algorithmically?

Thanks,
David
or*******@yahoo.com


If you only need divisions of two, you might be able to cook up somthing
using a binary tree.

Jul 17 '05 #3
"David Turner" <or*******@yahoo.com> wrote in message news:<7H*******************@news4.srv.hcvlny.cv.ne t>...
Hi All!,

Does anyone know a link to a website or know the best way to create a data
structure that will pick random elements from itself with assigned
probabilities?

For example, say I created a bag of 3 marbles(red, green, and blue) and the
red marble had a 50% change of being picked while the green and blue marbles
both had 25% chance of being picked. they should be picked randomly, but
picked in such a way that the red marble will be picked half the time and
the other 2 a quarter of the time. What is the best way to do this
algorithmically?

Thanks,
David
or*******@yahoo.com


hi there
u can maintain array of nodes with their respective probabilities & an
array to maintain how many times the node was chosen till time & then
generate a random number & check if the no of times > no of times it
should have been chosen then generate another no else chose that node
this is it
regards
amey
Jul 17 '05 #4
How about if you define a set
of intervals that subdivide the
interval [0,1) in proportion to
the probabilities you desire. Then
generate a random number whose
value selects one of the intervals.
e.g. item 1 is 50 percent so interval is [0,0.5)
item 2 is 25 % so [0.5,0.75)
item 3 is 25% so [0.75,1)
random number is 0.67 so item 2 is selected.

Data structure is an array containing the lower and upper
bounds of each interval. Index of the array is the item.

"Amey Samant" <am*****@yahoo.com> wrote in message
news:66**************************@posting.google.c om...
"David Turner" <or*******@yahoo.com> wrote in message

news:<7H*******************@news4.srv.hcvlny.cv.ne t>...
Hi All!,

Does anyone know a link to a website or know the best way to create a data structure that will pick random elements from itself with assigned
probabilities?

For example, say I created a bag of 3 marbles(red, green, and blue) and the red marble had a 50% change of being picked while the green and blue marbles both had 25% chance of being picked. they should be picked randomly, but
picked in such a way that the red marble will be picked half the time and the other 2 a quarter of the time. What is the best way to do this
algorithmically?

Thanks,
David
or*******@yahoo.com


hi there
u can maintain array of nodes with their respective probabilities & an
array to maintain how many times the node was chosen till time & then
generate a random number & check if the no of times > no of times it
should have been chosen then generate another no else chose that node
this is it
regards
amey

Jul 17 '05 #5
<snip>
The way I would do it, for your example, would be to create an array
of
4 elements, with 0 and 1 = 'red', 2 = 'green' and 3 = 'blue', then
randomly select elements 0 to 3 evenly.

SteveE
</snip>

hi steve
this wont work.
consider if there are around 100 elements
& lets say an element 'red' has 10% prob & you have placed it in 0-9

in this case ... do you guarantee that once the 0-9 numbers have
already been selected randomly once, they are not selected before
other numbers are selected randomly
i mean random generator does not guarantee that the number selected
once is not selected again before all numbers are exhausted
thus in your case though you have provided only space for 10 for 'red'
it might be selected more than 10 times in 100 experiments thus
violating its given prob

hope i made it clear

regards
amey
Jul 17 '05 #6
>
hi steve
this wont work.
consider if there are around 100 elements
& lets say an element 'red' has 10% prob & you have placed it in 0-9

in this case ... do you guarantee that once the 0-9 numbers have
already been selected randomly once, they are not selected before
other numbers are selected randomly
i mean random generator does not guarantee that the number selected
once is not selected again before all numbers are exhausted
thus in your case though you have provided only space for 10 for 'red'
it might be selected more than 10 times in 100 experiments thus
violating its given prob


OK, it does rely on the random number generator producing an even spread.

If you need guarantee an even distribution then what about shuffling the
100 elements randomly and selecting incrementally, repeating the shuffle
every 100 elements?

SteveE

Jul 17 '05 #7
SteveE <ea****@btinternet.com_> wrote in message news:<Ai*********************@news-text.cableinet.net>...
OK, it does rely on the random number generator producing an even spread.

If you need guarantee an even distribution then what about shuffling the
100 elements randomly and selecting incrementally, repeating the shuffle
every 100 elements?

SteveE


hi StevE
see you agreed that random number generation would cause problem in
first case then again in your answer you are talking about shuffling
elements RANDOMLY :d how would you solve the second RANDOM case then
;)
neways
another issues with this could be COST
if there are 1000 elements , shuffling 1000 elements for every element
would make your program much slower as the complexity increases
TREMENDOUSLY with this logic

regards
amey
Jul 17 '05 #8
> see you agreed that random number generation would cause problem in
first case then again in your answer you are talking about shuffling
elements RANDOMLY :d how would you solve the second RANDOM case then
;)
There will always be restriction of psuedo-random numbers never trully
being random (unless you use some exotic external generator) when using
computers. This method will guarantee an even distribution over the
number of iterations required, you just need to ensure you shuffle
sufficiently to give the 'appearance' of being genuinely random.
neways
another issues with this could be COST
if there are 1000 elements , shuffling 1000 elements for every element
would make your program much slower as the complexity increases
TREMENDOUSLY with this logic


No, I don't think the complexity changes at all. The actual data
retrieval would be very fast, but you would pay a time cost when
shuffling depending on the distribution, number of elements required and
iterations used. Then again, shuffling data around is what computers are
all about ;)

I think it all boils down to the exact requirements and constraints. For
instance, the original poster's example would only require 4 elements.
You could even factor this up to 40000 elements to defer the shuffling,
depending on the frequency of calls, memory available, etc.

There is another beneficial side-effect of using this method, which is
that the time taken for data retrieval and shuffling is constant (as
much as you can be in Java), whereas a 'hit and miss' approach, where
values are rejected until within the required distribution, is entirely
unpredictable time-wise (well, pseudo-unpredictable at least :)

SteveE

Jul 17 '05 #9
hi Steve
its great to discuss & clear things up ....
There will always be restriction of psuedo-random numbers never trully
being random (unless you use some exotic external generator) when using
computers. This method will guarantee an even distribution over the
number of iterations required, you just need to ensure you shuffle
sufficiently to give the 'appearance' of being genuinely random.
i tried this code

import java.util.*;
class TestRandom
{
public static void main(String[] args)
{
Random ran =new Random(10);
for(int i=0;i<10;i++)
System.out.println(i+" - "+ran.nextInt(10));
}
}

this should generate 10 random numbers. i goes from 0 - 9 & random
number bounds are 0 - 9 as well .

& the results were as follows
OUTPUT :

0 - 3
1 - 0
2 - 3
3 - 0
4 - 6
5 - 6
6 - 7
7 - 8
8 - 1
9 - 4

is this evenly distributed ?
how would you shuffle such that you always get HIT or at least try to
maximize HITs ?
No, I don't think the complexity changes at all. The actual data
retrieval would be very fast, but you would pay a time cost when
shuffling depending on the distribution, number of elements required and
iterations used. Then again, shuffling data around is what computers are
all about ;)
swapping has always been an expensive task especiaaly when it comes
inside nested loop.
here is a joke (my own creation) "swapping = three variables , three
instructions to swap two values " ;)

I think it all boils down to the exact requirements and constraints. For
instance, the original poster's example would only require 4 elements.
You could even factor this up to 40000 elements to defer the shuffling,
depending on the frequency of calls, memory available, etc.


no, i agree that requirements play role but what when number of values
is known only at runtime?
even if you know that there are 4 elements today, do you assume that
there will not be more than 100 elements in your next version/upgrade
;) ?
this way your entire logic would change with change in quantity which
may not be desirable.
when the no. of elements are FIXED then it is ok to choose by this
way.

regards
amey
Jul 17 '05 #10
For a long time people wondered how many "shuffles" it takes
to say the cards are now sufficiently mixed for the next deal.
Within the last several years I believe there was a proof that
it takes 5.
"Amey Samant" <am*****@yahoo.com> wrote in message
news:66**************************@posting.google.c om...
hi Steve
its great to discuss & clear things up ....
There will always be restriction of psuedo-random numbers never trully
being random (unless you use some exotic external generator) when using
computers. This method will guarantee an even distribution over the
number of iterations required, you just need to ensure you shuffle
sufficiently to give the 'appearance' of being genuinely random.


i tried this code

import java.util.*;
class TestRandom
{
public static void main(String[] args)
{
Random ran =new Random(10);
for(int i=0;i<10;i++)
System.out.println(i+" - "+ran.nextInt(10));
}
}

this should generate 10 random numbers. i goes from 0 - 9 & random
number bounds are 0 - 9 as well .

& the results were as follows
OUTPUT :

0 - 3
1 - 0
2 - 3
3 - 0
4 - 6
5 - 6
6 - 7
7 - 8
8 - 1
9 - 4

is this evenly distributed ?
how would you shuffle such that you always get HIT or at least try to
maximize HITs ?
No, I don't think the complexity changes at all. The actual data
retrieval would be very fast, but you would pay a time cost when
shuffling depending on the distribution, number of elements required and
iterations used. Then again, shuffling data around is what computers are
all about ;)


swapping has always been an expensive task especiaaly when it comes
inside nested loop.
here is a joke (my own creation) "swapping = three variables , three
instructions to swap two values " ;)

I think it all boils down to the exact requirements and constraints. For
instance, the original poster's example would only require 4 elements.
You could even factor this up to 40000 elements to defer the shuffling,
depending on the frequency of calls, memory available, etc.


no, i agree that requirements play role but what when number of values
is known only at runtime?
even if you know that there are 4 elements today, do you assume that
there will not be more than 100 elements in your next version/upgrade
;) ?
this way your entire logic would change with change in quantity which
may not be desirable.
when the no. of elements are FIXED then it is ok to choose by this
way.

regards
amey

Jul 17 '05 #11
Amey Samant wrote:
hi Steve
its great to discuss & clear things up ....
Nothing like a good debate :)
snipped code< is this evenly distributed ?
how would you shuffle such that you always get HIT or at least try to
maximize HITs ?
OK, you would probably get a more even distribution over say, 1,000,000
iterations. It also depends on how good the random number generator is :)

I probably didn't explain my second idea very well, so here's some code:

---
import java.util.Collections;
import java.util.Random;
import java.util.Vector;
public class Distribute {
public static void main(String[] args) {

Random rand = new Random(System.currentTimeMillis());
int red = 0, green = 0, blue = 0;

//Use a Vector so we can use the Collections methods
Vector el = new Vector();
//50% distribution of Red, 25% Green, 25% Blue
el.add("Red");
el.add("Red");
el.add("Green");
el.add("Blue");

//main loop, let's try 400 (100*4) iterations
final int ITER = 100;
for (int j = 0; j < ITER; j++) {

//Nice handy method to shuffle the elements :)
//Note that shuffling elements only swaps
//pointers and not the data itself.
Collections.shuffle(el, rand);

//display the shuffled elements
for (int i = 0; i < el.size(); i++) {

System.out.println(el.get(i));

//let's keep a count of the different values
if (el.get(i).equals("Red"))
red++;
else if (el.get(i).equals("Green"))
green++;
else
blue++;
}
}

System.out.println("Distribution:");
System.out.println(" Red " + (float)red / (ITER *
el.size()));
System.out.println(" Green " + (float)green / (ITER *
el.size()));
System.out.println(" Blue " + (float)blue / (ITER *
el.size()));
}
}
---
no, i agree that requirements play role but what when number of values
is known only at runtime?
even if you know that there are 4 elements today, do you assume that
there will not be more than 100 elements in your next version/upgrade
;) ?
this way your entire logic would change with change in quantity which
may not be desirable.
when the no. of elements are FIXED then it is ok to choose by this
way.


The above code could be incorporated into an object where the number of
values and their weighting are assigned at instantiation, with a method
to return the values.

Cheers,
SteveE

Jul 17 '05 #12
SteveE <ea****@btinternet.com_> wrote in message news:<PR*********************@news-text.cableinet.net>...
Amey Samant wrote: I probably didn't explain my second idea very well, so here's some code:

hi

ye the code really works great
:D

cheers
amey
Jul 17 '05 #13

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
by: Mike Jones | last post by:
need help with data structures.Looking for ways to start, sample code, anything Program description: Design and implement a Visual C++ .NET program that inserts values into a data...
2
by: yee young han | last post by:
I need a fast data structure and algorithm like below condition. (1) this data structure contain only 10,000 data entry. (2) data structure's one entry is like below typedef struct _DataEntry_...
6
by: Amy | last post by:
Hi there: I'm trying to link two large data sets collected from two different sources for epidemiological research. These two data sets share a few common fields such as Social Insurance Number,...
11
by: theshowmecanuck | last post by:
As a matter of academic interest only, is there a way to programmatically list the 'c' data types? I am not looking for detail, just if it is possible, and what function could be used to...
3
by: Kiran B. | last post by:
Hi, I am new to .net. I have two Data Structure Type ... Sturcture A and Structure B. Structure A Public Fname as String Public LastName as String Public City as String Public Zip as String...
11
by: Macca | last post by:
Hi, I'm writing an application that will pass a large amount of data between classes/functions. In C++ it was more efficient to send a pointer to the object, e.g structure rather than passing...
3
by: aurora | last post by:
This is an entry I just added to ASPN. It is a somewhat novel technique I have employed quite successfully in my code. I repost it here for more explosure and discussions. ...
29
by: zoltan | last post by:
Hi, The scenario is like this : struct ns_rr { const u_char* rdata; }; The rdata field contains some fields such as :
30
by: Charles Law | last post by:
Here's one that should probably have the sub-heading "I'm sure I asked this once before, but ...". Two users are both looking at the same data, from a database. One user changes the data and...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.