473,883 Members | 1,671 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

How to select record based on probability?

TheSmileyCoder
2,322 Recognized Expert Moderator Top Contributor
I have a table of names, with 2 fields in it, dblProbability and txName, where dblProbabilty is the relative probability that the Name in that row should be selected. How can I get 1 random name from my table, taking into account the probability that it should occur?
Jun 5 '12
25 5650
zmbd
5,501 Recognized Expert Moderator Expert
dsanto It certainly does re-itereate thru the records:
Expand|Select|Wrap|Line Numbers
  1.             'test for rejection *****
  2.             If Uj >= Pj Then GoTo ProcessStart
wherein "ProcessSta rt" goes back thru the process of rnd#, seek, get wj, calc wj/Wmax then the above comparison.
The other issue is, that while using the seek method only the first matching is returned so a lower/higher ranked entry may occur after the returned record unless the recordset is ordered on the ranking.
{edit}
Sorry, I should have added that the examples you've given are still very viable and a method that I have used myself in a smaller project (hence the posted document) - I was in a hurry to get to the other lab... no excuse, just an explanation.
{edit}


Rabbit YEP, YEP...
"...total sum of frequencies, you get what you call a weighted probability..."
YES! That is what I've been generating... then pull the normal distribution from the RND against the weighted p.

I';ve got to go to another lab so I'll have to take a closer look at the offered query this evening.

-z
Jun 7 '12 #21
dsatino
393 Contributor
zmbd: No need to point out pieces of code. I wrote it so I know what it does.

Yes, it goes back through a portion of the 'process', but it does not 're-iterate' through any records. The only 'iteration' is the primary loop based on the sample size. The only movement through records is the .seek of the random record to test. And on that note, it's seeking on a unique index so there is no issue with using .seek in the example I provided.
Jun 7 '12 #22
zmbd
5,501 Recognized Expert Moderator Expert
Sorry I haven’t gotten back to this sooner… my day off and I have the three kids. A 9yr and twins 2.5yr. Love them very much; however, they take a lot more of my time!
But it won’t last forever, so I have to get the hugs in now (esp. from my son...) all too soon I won’t be the cool Daddy for several years – then we get smarter and cooler again… when they have their kids! };-)

dsatino
No offense intended by quoting your code. Given, that you wrote the code, you very well understand the code.

I do not understand your anger... o-O

However, not everyone reading this forum will have the same level of understanding of vba nor will they have opened the attachment and have reviewed the code, or it may be the fact that they are unable to open the attachment for various reasons thus will not understand an oblique reference to the code.

How then is one to reference the section of code without quoting it and causing offense? O-o
I was under the impression that this was somewhat of an academic environment where such references as this would not be taken as a condescending action, instead, taken as a point of clarification and point of reference for a common understanding.

As for the question of (re)iteration:
Please read: http://www.merriam-webster.com/dictionary/iteration
1: the action or a process of iterating or repeating: as
a : a procedure in which repetition of a sequence of operations yields results successively closer to a desired result
b : the repetition of a sequence of computer instructions a specified number of times or until a condition is met — compare recursion
Your code meets this definition twice:
The outer "for...next " and then the inner A/R code. The fact that with each rejection of found record triggers another round of "seeking" wherein the command must then (re)read each record's [Probability] value, is a re-iteration not only of the code but of also each record. And while we’re at it, the “Seek” command contains a hidden third iteration of actions as the “Seek” is nothing more than a nice wrapper around the commands to move to a record, read a specific field value, and compare that result with a condition and repeat the process until the condition is met… the first time… then stop at that record’s position in the data table for further manipulation.

One assumes with “Seek” that the first record meeting the criteria is either the only one to do so or it will be “good enough.”

Now that I have established that in fact that there is an actual (re)iteration of the commands within your code lets tweak it.
Instead of using the "goto ProcessStart" to jump back to a label in the code, a do-until loop would be preferred. The “goto” “gosub” commands that were all too common in BASIC are now frowned upon as poor coding within VBA (although, we still use them for error trapping – go figure). So to use the loop one might code as follows (once again, using the code from your example):

dsatino: Please skip this code block as you already know and understand what modifications would be needed to tweak the code for a loop.)


Expand|Select|Wrap|Line Numbers
  1. For sLoop = 1 To smplSz
  2.     Do ''(<<<<<<Start of mod)
  3.         'set j ******
  4.         j = Int(Rnd() * recCt) + 1
  5.         '
  6.         'set Wj *****
  7.         '
  8.         With rst
  9.             '.Index = "nKey"
  10.             .Seek "=", j
  11.             Wj = !probability
  12.         End With
  13.         '
  14.         'set Pj and Uj *****
  15.         Pj = Wj / Wmax
  16.         Uj = Rnd()
  17.         'test for rejection *****
  18.     Loop Until Uj <= Pj ''(<<<<<<end of mod)
  19.     '
  20.     With rst2
  21.         .AddNew
  22.         .Fields(0) = j
  23.         .Update
  24.     End With
  25. '
  26. Next sLoop
Now, one should be more able to more clearly see the two iterations within your code.

Hey... while looking at that... why are we pulling a new random number each time we have a record fail the A/R test... just keep going thru the records until we find a passing test. SO... pull the two RND out of the inner loop to the outer for...next ... so much more I can do here.

On the next note:
While your example might very well use a unquie and indexed [Probability] (oh, I see that in the table it isn't) the [dblProbability] from Smilie's post (#5) very clearly does not require unique values and may not be indexed; thus, seeking on this field may not return a truely random dataset... in fact, using #5 again, if the test is such that a weight of 4 will meet the A/R requirements then when the test is ran the seek command will always return "Shoreh" as the name and will never return "Van Sas" (provided the records are in the order shown).

And while I’m at it:

The query you created returns way too many records in that it returns all of the records from the table “namepickt est” and the subsequent records against table “sample”:
(I pull it out here for those individules that can not open your example… dsatino, once again, please skip this code block as you already know and understand it.)
Expand|Select|Wrap|Line Numbers
  1. SELECT Sample.nKey, NamePickTest.Name, Count(Sample.nKey) AS InSample, NamePickTest.Occurrence, NamePickTest.Probability
  2. FROM Sample RIGHT JOIN NamePickTest ON Sample.nKey = NamePickTest.nKey
  3. GROUP BY Sample.nKey, NamePickTest.Name, NamePickTest.Occurrence, NamePickTest.Probability;
I believe that you wanted:
Expand|Select|Wrap|Line Numbers
  1. SELECT NamePickTest.nKey, NamePickTest.Name, NamePickTest.Occurrence, NamePickTest.Probability
  2. FROM NamePickTest INNER JOIN Sample ON NamePickTest.nKey = Sample.nKey;
The final issue I have with your example is [Probability] wherein you have stored the calculated value (which from post #5 would not be unique)... now this will certainly speed things up; however, IMHO not the best way... what happens when max[Frequency] is no longer the value you've chosen to pre-calculate [Probability]? (Of course, TheSmileyCoder stated that the values would <not> be updated... we must allow that for others with a similuar question may not have such static information. It may also be very well that for Smiley, the current static info may indeed need to be updated at a latter date)

It is a simple matter to find max[Frequency]... even a large dataset, store that value as a variable in the code and then do the calculation.... such as using the dreaded DMax or you could run a query such as (once again using your example):
Expand|Select|Wrap|Line Numbers
  1. SELECT SampleResult.nKey, SampleResult.Name, SampleResult.InSample, SampleResult.Occurrence, SampleResult.Probability
  2. FROM SampleResult 
  3. INNER JOIN 
  4. (SELECT Max(SampleResult.[Occurrence]) AS MaxOfOccurrence
  5. FROM SampleResult) as [maxq]
  6. ON SampleResult.Occurrence = [maxq].MaxOfOccurrence;
for fun, I pulled all of the fields; however, you really only need the "[SampleResult].[Occurrence]" in the outer select.

Actually… in this case, if I were using a query, I’d use just that inner select:
Expand|Select|Wrap|Line Numbers
  1. SELECT Max(SampleResult.[Occurrence]) AS MaxOfOccurrence
  2. FROM SampleResult);
to return the max value as the rest of the information is of no value for the subsequent code.

-z
Jun 8 '12 #23
zmbd
5,501 Recognized Expert Moderator Expert
RABBIT
and
TheSmileyCoder
:
The range thing really took some effort...
Rabbit, taking your queries and my weighted probability calculations forcing them together:

Expand|Select|Wrap|Line Numbers
  1. SELECT Q1.txtName, Q1.dblProbability, Q1.frequency, startRange - Q1.frequency AS endRange, (SELECT sum(SQ3.frequency) AS Total
  2. FROM qry_name AS SQ3
  3. WHERE SQ3.dblProbability <= Q1.dblProbability) AS startRange
  4. FROM qry_Name AS Q1, qry_Name AS Q2
  5. WHERE (((Q1.Frequency)>=Q2.Frequency) And ((Q1.txtName)>=Q2.txtName))
  6. GROUP BY Q1.txtName, Q1.dblProbability, Q1.frequency
  7. ORDER BY Q1.frequency DESC;
Of note is the startRange and endRange... I used a variation on a theme for "running totals" for the startRange that I have used in other situations... don't remember where I got the method from; however, it was somewhere out there in the cloud - anyway, not my original idea and hopefully the author will forgive the lack of proper citation.

I'm still having some issues with:
Expand|Select|Wrap|Line Numbers
  1. SELECT qry_weighted_w_range.txtname
  2. FROM qry_weighted_w_range, tbl_SampleSize
  3. WHERE (((tbl_SampleSize.n)<=50) AND ((Rnd(-1*[n]*Second(Now())*Minute(Now())))>=[EndRange] And (Rnd(-1*[n]*Second(Now())*Minute(Now())))<[StartRange]));
consistently returning the desired number of records. Note that (tbl_SampleSize .n)<=50) will occasionally return 47ish to 56ish records; however, I think that is due to the distribution in the names table (basically the frequency of the English alphabet as used in typical American English) allowing for roughly 10 pairs of letters (names) with the same frequency...

Attached the database for your amusement... the query is set to return the 10K as per your original - quite fast.

Now I see why I used the VBA methodology instead of trying the SQL approach; however, I do like the fact that this will run without code enabled.

Hopefully Smilie will be able to use something here.

-z
Attached Files
File Type: zip random_pick.zip (118.3 KB, 137 views)
Jun 12 '12 #24
dsatino
393 Contributor
zmbd

There was no anger. You directed your response specifically to me and given that much of the code on this site is not original work I thought I would save you the trouble of pointing it out.

As for argument pertaining to 'reiteraton', I do find it humorous you felt the need to post the definition and subsequently posted the definition of 'iteration'. From a computing standpoint, they are not the same.

The For sLoop=...Next Loop is the act of iterating through the values of sLoop. If, for example, the value produced for sLoop=10 is rejected, then the process is reiterated for sLoop=10.

Also, your interpretation of how the code and the .seek method work is not correct. At no time does this code 'read each records probability value'. It randomly picks a single record, navigates to that record using seek, and then captures the probability value of that record.

The .seek method does not move through recordsets making comparisons at each record, that sounds more like the .find methods, rather it uses an index value to move to a specific record. These are not the same thing. You are only correct in that it will stop at the first record to have that index value, but as i previously noted, the index that I'm using is unique so that can't occur in the example I provided.

Then there's this:
"Hey... while looking at that... why are we pulling a new random number each time we have a record fail the A/R test... just keep going thru the records until we find a passing test. SO... pull the two RND out of the inner loop to the outer for...next ... so much more I can do here."

If you understood what the code was doing you'd know why we are picking a new random number. First, the random number "j" is not a random number, rather it's a random record. Second, the code is not iterating through the record set testing for success and even if it was, moving to the next record and testing for a pass/ fail would make the order of the records important which would negate the randomness.

As for the query I created, it's exactly what I wanted. I know it returns all of the names.

Then there's the issue of storing the values. It's true the counts would allow it to be more extensible, but this was an example. At some point, people have to use their own brain. Mostly I find it humorous that your knocking the stored values, but the example you provided requires a table to store the amount of results you want. My example produced a sample of 100,000 in under 4 seconds. Your example can't even produce that without adding 85,000 records to the table. What if I want a random sample size of a half-million or more. In your example I need a table with at least that many records, just for the sake of being able to pull that sample.

Lastly, I have no idea why you felt the need to point out how to store the max value in a variable, since that's what I did.
Jun 19 '12 #25
NeoPa
32,584 Recognized Expert Moderator MVP
I will continue to monitor this thread, but I hope that this discussion can get back on track. Involved members should check their PMs first though.
Jun 19 '12 #26

Sign in to post your reply or Sign up for a free account.

Similar topics

5
14956
by: Derek Cooper | last post by:
I hope you can help me. I posted this in the microsoft sql server newsgroup a few days ago and got no response so I thought I'd try here. If I can provide any clarification I'll be glad to do so. I'm trying to calculate a column based on the value of the previous record. I'm not very experienced with SQL-Server. I'm using the following table: CREATE TABLE tblPayment (
1
2667
by: arthur-e | last post by:
How can you select records based on more than one combo box - I have a combobox that selects records based on name (I'm sure this has been asked a thousand times - web site answer/link could be helpful too; but I'm so bad with syntax that specifics will be MOST helpful) SELECT DISTINCT ., . FROM Union Select "<ALL>" , NULL From ;
1
1137
by: zeeshansohail | last post by:
I have two tables named as “COMPANY and BILL_DETAIL” with the following structure. BILL_DETAIL TABLE Name Type Nullable ------------- -------------- -------- BILL_ID NUMBER(7) S_HEAD_CODE VARCHAR2(12) SR_NO NUMBER(9) BILL_AMOUNT NUMBER(10) Y
4
3582
by: mcca0081 | last post by:
hi - i'm trying to delete one of the duplicate records based on the most current date. here's the code for my access 2000 db. any help would be appreciated!!! - thank you kindly Sub DeleteDuplicateRecords() ' Deletes duplicates from the specified table, keeping the most current received date record. ' No user confirmation is required.
2
2565
by: Shivajirp | last post by:
I have table in which Shift no, Start time. End time. if start time of shift is 6pm and end time of shift is 6 am then I want to select record between start time and end time.I need query to selct such record.
0
2391
by: TD | last post by:
I have a main form with two subforms (both in datasheet view), neither of which are linked to the main form. The main form is based on a query that uses the bound column of a combobox on the main form as the criteria for the query. The combobox is based on a query that retrieves the name and record id of the customers. In the afterUpdate event of the combobox on the main form and both subforms are requeried. The first subform is based...
5
1933
by: agarwasa2008 | last post by:
Hi, I would like to delete a record based on a user entered string. Here are the details. I have a txtFind textbox. A string is entered by the user. Based on that string value it displays that one record in the fields in the frmDeleteComponent form. There is also a lstDelete listBox which displays all the record. But for some reason it does not delete that particular
6
4076
by: mukeshrasm | last post by:
Hi I want to update records in database based on selected records using checkbox. if user does not select a record and clicks update button it should show the message please select record to be update if user selects say 3 records then there should be message are you sure to update 3 records and when clicks ok records should get updated. since records are coming dynamically so how can I get the number of checkboxes selected or not to...
4
2631
by: Manikrag | last post by:
Hi Team, Is it possible to sort select query based on input string? I am looking for somthing like: select TOP 20 PREFERRED_NAME from FRS_TABLE where Lower(PREFERRED_NAME) like Lower('%shar%') order by PREFERRED_NAME LIKE '%shar%'
11
5696
by: prashantdixit | last post by:
Hi, I am developing a stock control software. Iam stuck somewhere. I have a form "Add new stock" consisting of combobox, text boxes etc. which is used to add records in a table. I have another form which has "Add Similar To" button. The table which i used to store records contains a field call "Add More Quantity". If a user enters any numerical value in this field against any record and click on "Add Similar To" button then same "Add New...
0
9933
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
9781
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10734
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
0
10407
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
9564
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
7114
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5982
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
4606
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
4210
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.