By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
424,456 Members | 1,400 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 424,456 IT Pros & Developers. It's quick & easy.

How do I select random subjects from different groups in one table?

lilp32
P: 43
I am trying to write a program to randomly select 20 subjects from 20 groups. All groups and subjects are in one table. The variable for group is SecondaryID and the variable for subject is SubjectID. I found some code online that works for the random selection but have been unable to modify the code to do this for each SecondaryID. Any suggestions would be much appreciated.

Thanks!

Expand|Select|Wrap|Line Numbers
  1. Option Compare Database
  2. Sub PickRandom()
  3.     Dim db As Database
  4.     Dim tdf As TableDef
  5.     Dim fld As Field
  6.     Dim rst As Recordset
  7.     Dim strSQL As String
  8.     Dim strTableName As String
  9.  
  10. ' 1: Create a new temporary table containing the required fields
  11.     strSQL = "SELECT tblDATA.SecondaryID, tblDATA.SubjectID " & _
  12.              "INTO tblTemp " & _
  13.              "FROM tblDATA;"
  14.     DoCmd.SetWarnings False
  15.     DoCmd.RunSQL strSQL
  16.     DoCmd.SetWarnings True
  17.  
  18. ' 2: Add a new field to the new table
  19.     Set db = CurrentDb()
  20.     Set tdf = db.TableDefs("tblTemp")
  21.     Set fld = tdf.CreateField("RandomNumber", dbSingle)
  22.     tdf.Fields.Append fld
  23.  
  24. ' 3: Place a random number in the new field for each record
  25.     Set rst = db.OpenRecordset("tblTemp", dbOpenTable)
  26.     rst.MoveFirst
  27.     Do
  28.         Randomize
  29.         rst.Edit
  30.             rst![RandomNumber] = Rnd()
  31.         rst.Update
  32.         rst.MoveNext
  33.     Loop Until rst.EOF
  34.     rst.Close
  35.     Set rst = Nothing
  36.  
  37. ' 4: Sort the tblTemp by the random number and move the top 20 into a new table
  38.     strTableName = SecondaryID & Format(Date, "ddmmmyyyy")
  39.     strSQL = "SELECT TOP 20 tblTemp.secondaryID, tblTemp.SubjectID " & _
  40.              "INTO " & strTableName & " " & _
  41.              "FROM tblTemp " & _
  42.              "ORDER BY tblTemp.RandomNumber;"
  43.     DoCmd.SetWarnings False
  44.     DoCmd.RunSQL strSQL
  45.     DoCmd.SetWarnings True
  46.  
  47. ' 5: Delete the temporary table
  48.     db.TableDefs.Delete ("tblTemp")
  49. End Sub
  50.  
Jan 17 '12 #1

✓ answered by Rabbit

Is it numeric? It would have to be numeric. Note the difference in the code.
Expand|Select|Wrap|Line Numbers
  1. SELECT [t1].SecondaryID, [t1].SubjectID 
  2. FROM   ( 
  3.      SELECT *  
  4.           , Rnd(-Minute(Now()) * Second(Now()) * [UNIQUEID]) As RandNo  
  5.      FROM   [tblDATA] 
  6. ) AS [t1] 
  7. INNER JOIN ( 
  8.      SELECT *  
  9.           , Rnd(-Minute(Now()) * Second(Now()) * [UNIQUEID]) As RandNo  
  10.      FROM   [tblDATA] 
  11. ) AS [t2] 
  12. ON [t1].SecondaryID = [t2].SecondaryID 
  13.    AND [t1].RandNo >= [t2].RandNo 
  14. GROUP BY [t1].SecondaryID, [t1].SubjectID 
  15. HAVING COUNT(*) <= 20 

Share this Question
Share on Google+
39 Replies


100+
P: 144
The code works for me. I noticed the var SecondaryID for part of the Tablename is empty (not initialized). I'll check it somemore though. (can't now). Check data types (I just used "long integer") and check spelling, etc. Look for a table with just the date.
Jan 17 '12 #2

NeoPa
Expert Mod 15k+
P: 31,186
It's not very clear what you even want, but I'll guess you have a table with records split across 20 possible groups, and you're looking to select one subject from each of these groups, randomly. If that doesn't express it properly then you need to try again yourself, because without a question that makes sense, answers are hard to come by.

As you also supply no information to indicate what format Subject and Group values are in I suppose we should assume it's textual. As random data is produced using the Rnd() function, which returns a value such that 0 <= X < 1, we need, somehow, to convert such a value into something that will select one of n records for each Group value, where n is the number of records for that Group.

I would suggest Recordset processing might be used to determine the actual record to select. Determine the number of records for each Group (DCount() could be used for this or processing through the recordset in order counting the matching records) then multiply this by the generated random number and round back to an integer value. This will be the number of records to move on to from the first of the Group.

You can produce a list of PK IDs to include within an In() list or process them as you find them.
Jan 17 '12 #3

ADezii
Expert 5K+
P: 8,597
You do not need to Create a Temporary Table, Create a Field, Sort the Temporary Table, then Delete the Temporary Table. The following Code, based on NeoPa's idea of Recordset processing, will pick a Random Subject for each Unique Group, based on the Number of Subjects for that Group. I've posted some Sample Data, the Code Logic, as well as Results for 3 Trial Runs:
  1. tblData
    Expand|Select|Wrap|Line Numbers
    1. SecondaryID    SubjectID
    2. Group A            A1
    3. Group A            A2
    4. Group A            A3
    5. Group D            D1
    6. Group X            X1
    7. Group X            X2
    8. Group X            X3
    9. Group X            X4
    10. Group D            D2
    11. Group D            D3
    12. Group D            D4
    13. Group D            D5
    14. Group D            D6
    15. Group D            D7
    16. Group D            D8
    17. Group D            D9
    18. Group D            D10
    19. Group X            X5
    20.  
  2. Code Logic
    Expand|Select|Wrap|Line Numbers
    1. Dim MyDB As DAO.Database
    2. Dim rstUniqueGroups As DAO.Recordset
    3. Dim rstSubjectsForGroup As DAO.Recordset
    4. Dim strSQL1 As String
    5. Dim strSQL2 As String
    6. Dim intNumOfRecs As Integer     'Number of Subjects per Group
    7.  
    8. Randomize
    9.  
    10. Set MyDB = CurrentDb
    11.  
    12. 'Recordset representing Distinct Groups in tblData
    13. strSQL1 = "SELECT DISTINCT [SecondaryID] FROM tblData"
    14.  
    15. Set rstUniqueGroups = MyDB.OpenRecordset(strSQL1, dbOpenSnapshot, dbOpenForwardOnly)
    16.  
    17. With rstUniqueGroups
    18.   Do While Not .EOF
    19.     'Recordset representing Subjects for each Group in tblData
    20.     strSQL2 = "SELECT * FROM tblData WHERE [SecondaryID] = '" & ![SecondaryID] & "';"
    21.     Set rstSubjectsForGroup = MyDB.OpenRecordset(strSQL2, dbOpenSnapshot)
    22.       rstSubjectsForGroup.MoveLast: rstSubjectsForGroup.MoveFirst
    23.       intNumOfRecs = rstSubjectsForGroup.RecordCount
    24.         rstSubjectsForGroup.Move Int(Rnd * intNumOfRecs)
    25.         Debug.Print "Group: " & ![SecondaryID] & vbCrLf & "Number of Records: " & intNumOfRecs & _
    26.                      vbCrLf & "Random Subject for " & ![SecondaryID] & ": " & rstSubjectsForGroup![SubjectID]
    27.       Debug.Print "**********************************"
    28.     .MoveNext
    29.   Loop
    30. End With
    31.  
    32. rstUniqueGroups.Close
    33. rstSubjectsForGroup.Close
    34. Set rstUniqueGroups = Nothing
    35. Set rstSubjectsForGroup = Nothing
  3. Results
    Expand|Select|Wrap|Line Numbers
    1. Group: Group A
    2. Number of Records: 3
    3. Random Subject for Group A: A3
    4. **********************************
    5. Group: Group D
    6. Number of Records: 10
    7. Random Subject for Group D: D5
    8. **********************************
    9. Group: Group X
    10. Number of Records: 5
    11. Random Subject for Group X: X2
    12. **********************************
    13.  
    14. Group: Group A
    15. Number of Records: 3
    16. Random Subject for Group A: A2
    17. **********************************
    18. Group: Group D
    19. Number of Records: 10
    20. Random Subject for Group D: D4
    21. **********************************
    22. Group: Group X
    23. Number of Records: 5
    24. Random Subject for Group X: X5
    25. **********************************
    26.  
    27. Group: Group A
    28. Number of Records: 3
    29. Random Subject for Group A: A1
    30. **********************************
    31. Group: Group D
    32. Number of Records: 10
    33. Random Subject for Group D: D2
    34. **********************************
    35. Group: Group X
    36. Number of Records: 5
    37. Random Subject for Group X: X1
    38. **********************************
Jan 18 '12 #4

NeoPa
Expert Mod 15k+
P: 31,186
Let me start by congratulating ADezii on some good code that produces the correct results. I also knocked up some code to illustrate how an idea, very similar to my previously posted one, could work without the need for so many opens of recordsets (as this is relatively expensive of resources). It uses the concept of processing through the table until a Group's records have been passed, then moving back a random numer of records for the selected record, then forward again to the first of the next Group to continue onwards. I haven't a rig to test it on but I would expect it to work efficiently :
Expand|Select|Wrap|Line Numbers
  1. Dim cdb As DAO.Database
  2. Dim strSQL As String, strGroup As String
  3. Dim lngNumSubs As Long, lngRand As Long
  4. Dim blnCheck As Boolean
  5.  
  6. Call Randomize
  7. Set cdb = CurrentDb
  8. strSQL = "SELECT   * " & _
  9.          "FROM     [tblDATA] " & _
  10.          "ORDER BY [SecondaryID]"
  11.  
  12. With cdb.OpenRecordset(strSQL, dbOpenSnapshot, dbReadOnly)
  13.     Do
  14.         If .EOF Then
  15.             blnCheck = True
  16.         ElseIf .SecondaryID <> strGroup Then
  17.             blnCheck = True
  18.         End If
  19.         If blnCheck Then
  20.             lngRand = lngNumSubs - Int(Rnd() * lngNumSubs)
  21.             Call .Move(Rows:=-lngRand)
  22.             Debug.Print "Group: " & strGroup & vbCrLf & _
  23.                         "Number of Records: " & lngNumSubs & vbCrLf & _
  24.                         "Random Subject: " & .SubjectID & vbCrLf & _
  25.                         "**********************************"
  26.             Call .Move(Rows:=lngRand)
  27.             If Not .EOF Then
  28.                 strGroup = .SecondaryID
  29.                 lngNumSubs = 0
  30.                 blnCheck = False
  31.             End If
  32.         End If
  33.         If .EOF Then Exit Do
  34.         lngNumSubs = lngNumSubs + 1
  35.         Call .MoveNext
  36.     Loop
  37. End With
PS. I should point out I used ADezii's code as a starting position. It's only changed in certain places.
Jan 18 '12 #5

Rabbit
Expert Mod 10K+
P: 12,315
Here's a SQL version
Expand|Select|Wrap|Line Numbers
  1. SELECT T1.*
  2. FROM Table1 T1
  3.    INNER JOIN (
  4.       SELECT SecondaryID, MAX(Rnd(-1 * DatePart("s", NOW()) * ASC(SecondaryID) * RIGHT(SubjectID, 1))) AS Expr1
  5.       FROM Table1
  6.       GROUP BY SecondaryID
  7.    ) T2
  8.    ON T1.SecondaryID = T2.SecondaryID
  9. WHERE T2.Expr1 = Rnd(-1 * DatePart("s", NOW()) * ASC(T1.SecondaryID) * RIGHT(T1.SubjectID, 1));
It would be preferable to replace the stuff in the Rnd with a unique ID.
Jan 18 '12 #6

NeoPa
Expert Mod 15k+
P: 31,186
This is really, really, clever.

Unfortunately, I'd take some convincing of the randomness of [Expr1], effected as it is by the values of the two fields within the calculation. A unique ID would also suffer from the same effect in my view. Some very smart thinking behind the concept though.

PS. I'm open to question if you disagree Rabbit.
PPS. Maybe if the multiplication by any of the items (or parts thereof) found within the record were added to the parameter part of the Rnd() call instead of outside of it, the process might work perfectly?
Jan 18 '12 #7

ADezii
Expert 5K+
P: 8,597
Like you, NeoPa, I found Rabbit's approach to be very cleaver, not to mention intriguing, so I made 10 Trial Runs on the SQL for curiosity sake. The Results are as follows:
Expand|Select|Wrap|Line Numbers
  1. Group A -  3 Subjects
  2. Group D - 10 Subjects
  3. Group X -  5 Subjects
  4.  
  5. A1 - 20%
  6. A2 - 60%
  7. A3 - 20%
  8.  
  9. D1 - 10%
  10. D2 - 30%
  11. D3 - 10%
  12. D4 -  0%
  13. D5 -  0%
  14. D6 - 20%
  15. D7 - 30%
  16. D8 -  0%
  17. D9 -  0%
  18. D10 - 0%
  19.  
  20. X1 - 20%
  21. X2 - 60%
  22. X3 - 20%
  23. X4 -  0%
  24. X5 -  0%
  25.  
P.S. - With a greater number of Runs, I would assume that the outcome would have been more evenly distributed.
Jan 18 '12 #8

lilp32
P: 43
Thanks everyone for your responses. I apologize for the lack of information - I am pretty new to this. I realize that my question was unclear - I would actually like to select 20 random subjects for EACH group for a total of 400. I got the code by ADezii to work for selecting one for each group; the code by NeoPa gives me an object required error at line 16 (strGroup). Thanks again.
Jan 18 '12 #9

NeoPa
Expert Mod 15k+
P: 31,186
@ADezii.
Your test data, where the first character of each [SecondaryID] value is the same for all, would not illustrate what I think is a shortcoming in Rabbit's code. Even if it did, the effect of it would be very hard to notice. It's actually easier to spot in the logic than in the results - as the Random results would tend to obscure any, otherwise noticeable, results.

@Lilp32.
Firstly thank you and congratulations for responding to the posts. It always helps and keeps up the interest.
I cannot easily think what the error could be referring to but here are some possibilities :
  1. Line #2 wasn't included in your test so strGroup was unrecognised as a string variable.
  2. tblDATA.SecondaryID is some form of object that doesn't translate to a string. I wrote the code to fit ADezii's scenario (where both fields are string types). Some changes might be required if your actual situation doesn't match that. As yet we don't really know that situation very clearly.
Frankly neither seems particularly likely. Even if tblDATA.SecondaryID were a number it would convert the data automatically and still work, and why you would miss some of the code out I cannot imagine.

On to your expanded/clarified question.
This makes life much more complicated. I would expect ADezii's approach to be more expandable to suit that if I'm honest, although the following routine might be incorporated into my code to produce the same results :
Expand|Select|Wrap|Line Numbers
  1. 'Returns a string representing multiple moves through the recordset to select up
  2. 'to 20 random records.  Always ends up at start of next Group.
  3. Private Sub Randomise(ByRef strResult As String, ByVal lngCount As Long)
  4.     Dim lngMax As Long, lngA As Long, lngX As Long, lngY As Long, lngZ As Long
  5.     Dim alngRecs() As Long
  6.  
  7.     ReDim alngRecs(lngCount - 1) As Long
  8.     lngMax = 20
  9.     If lngMax > lngCount Then lngMax = lngCount
  10.     For lngX = 1 To lngCount
  11.         alngRecs(lngX - 1) = lngX
  12.     Next lngX
  13.     strResult = ""
  14.     For lngX = 0 To lngMax - 1
  15.         lngY = Rnd() * lngCount + lngX
  16.         lngZ = alngRecs(lngY)
  17.         alngRecs(lngY) = alngRecs(lngX)
  18.         alngRecs(lngX) = lngZ
  19.         strResult = strResult & "," & lngZ - lngA
  20.         lngA = lngZ
  21.     Next lngX
  22.     strResult = Mid(strResult, 2) & lngZ
  23. End Sub
Jan 18 '12 #10

NeoPa
Expert Mod 15k+
P: 31,186
Alternatively, Rabbit's code could be manipulated to produce :
Expand|Select|Wrap|Line Numbers
  1. SELECT *
  2.      , Rnd(-Minute(Now()) * Second(Now()) * Asc([SecondaryID]) * Asc(Right([SubjectID], 1))) As RandNo
  3. FROM   [tblDATA] AS [tD]
  4. WHERE [RandNo] In(SELECT   TOP 20
  5.                            Rnd(-Minute(Now()) * Second(Now()) * Asc([SecondaryID]) * Asc(Right([SubjectID], 1)))
  6.                   FROM     [tblDATA]
  7.                   WHERE    (tblDATE.SecondaryID = tD.SecondaryID)
  8.                   ORDER BY Rnd(-Minute(Now()) * Second(Now()) * Asc([SecondaryID]) * Asc(Right([SubjectID], 1))))
Jan 18 '12 #11

NeoPa
Expert Mod 15k+
P: 31,186
Incorporating this new code (Apologies for the similarity between Randomize and Randomise() by the way) was not as local a change as I'd hoped, so I need to repost the whole lot. Most of the changes (apart from a few changes to the Dims) were between the original lines #20 and #26 though :
Expand|Select|Wrap|Line Numbers
  1. Dim cdb As DAO.Database
  2. Dim strSQL As String, strGroup As String, strResult As String
  3. Dim lngNumSubs As Long, lngX As Long
  4. Dim blnCheck As Boolean
  5. Dim varResults As Variant
  6.  
  7. Call Randomize
  8. Set cdb = CurrentDb
  9. strSQL = "SELECT   * " & _
  10.          "FROM     [tblDATA] " & _
  11.          "ORDER BY [SecondaryID]"
  12.  
  13. With cdb.OpenRecordset(strSQL, dbOpenSnapshot, dbReadOnly)
  14.     Do
  15.         If .EOF Then
  16.             blnCheck = True
  17.         ElseIf .SecondaryID <> strGroup Then
  18.             blnCheck = True
  19.         End If
  20.         If blnCheck Then
  21.             Debug.Print "Group: " & strGroup & vbCrLf & _
  22.                         "Number of Records: " & lngNumSubs
  23.             Call Randomise(strResult, lngNumSubs)
  24.             varResults = Split(strResult, ",")
  25.             For lngX = 0 To lngNumSubs - 1
  26.                 Call .Move(Rows:=-varResults(lngX))
  27.                 Debug.Print "Random Subject: " & .SubjectID
  28.             Next lngX
  29.             Call .Move(Rows:=varResults(lngNumSubs))
  30.             Debug.Print "**********************************"
  31.             If Not .EOF Then
  32.                 strGroup = .SecondaryID
  33.                 lngNumSubs = 0
  34.                 blnCheck = False
  35.             End If
  36.         End If
  37.         If .EOF Then Exit Do
  38.         lngNumSubs = lngNumSubs + 1
  39.         Call .MoveNext
  40.     Loop
  41. End With
Jan 18 '12 #12

lilp32
P: 43
Thanks again. I changed the SecondaryID to be a text field just in case. If it helps, the subject ID field is a number and numbers can be repeated across sites but not in the same site.

When I run this code I get the following compile error: "Wrong number of arguments or invalid property assignment" at line 23 (I changed randomise to randomize).

Expand|Select|Wrap|Line Numbers
  1. Dim cdb As DAO.Database
  2. Dim strSQL As String, strGroup As String, strResult As String
  3. Dim lngNumSubs As Long, lngX As Long
  4. Dim blnCheck As Boolean
  5. Dim varResults As Variant
  6.  
  7. Call Randomize
  8. Set cdb = CurrentDb
  9. strSQL = "SELECT   * " & _
  10.          "FROM     [tblDATA] " & _
  11.          "ORDER BY [SecondaryID]"
  12.  
  13. With cdb.OpenRecordset(strSQL, dbOpenSnapshot, dbReadOnly)
  14.     Do
  15.         If .EOF Then
  16.             blnCheck = True
  17.         ElseIf tbldata.SecondaryID <> strGroup Then
  18.             blnCheck = True
  19.         End If
  20.         If blnCheck Then
  21.             Debug.Print "Group: " & strGroup & vbCrLf & _
  22.                         "Number of Records: " & lngNumSubs
  23.             Call Randomize(strResult, lngNumSubs)
  24.             varResults = Split(strResult, ",")
  25.             For lngX = 0 To lngNumSubs - 1
  26.                 Call .Move(Rows:=-varResults(lngX))
  27.                 Debug.Print "Random Subject: " & .SubjectID
  28.             Next lngX
  29.             Call .Move(Rows:=varResults(lngNumSubs))
  30.             Debug.Print "**********************************"
  31.             If Not .EOF Then
  32.                 strGroup = tbldata.SecondaryID
  33.                 lngNumSubs = 0
  34.                 blnCheck = False
  35.             End If
  36.         End If
  37.         If .EOF Then Exit Do
  38.         lngNumSubs = lngNumSubs + 1
  39.         Call .MoveNext
  40.     Loop
  41. End With
  42.  
Jan 18 '12 #13

ADezii
Expert 5K+
P: 8,597
@NeoPa:
I'm a little slow today so bear with me. How does your logic ensure that you end up with 20 Random, Unique, Subjects per Group. My question pertains to the 'Uniqueness' of the 20 Values. Thanks.
Jan 18 '12 #14

NeoPa
Expert Mod 15k+
P: 31,186
Lilp32:
When I run this code I get the following compile error: "Wrong number of arguments or invalid property assignment" at line 23 (I changed randomise to randomize).
That wasn't too clever as there is already a keyword Randomize. It's even used earlier in the code. Feel free to rename the procedure, but ensure it's not to an already-used keyword if you do.

PS. Have you tried out the solution proposed in post #11 yet. That may prove to be a much easier solution.
Jan 18 '12 #15

NeoPa
Expert Mod 15k+
P: 31,186
@ADezii.
To give a full explanation would take a tome, but the key is in populating the array originally with IDs in normal order then, whenever an item is selected, switching that with the first available element. Next run through creates a Random number which skips all the previously selected elements. The thing to look at closely is the + lngX of line #15 in post #10. Remember, multiplication has a higher arithmetic precedence than addition, so this is added to the product of the random number and lngCount producing a random value between lngX and lngCount - 1. All the previously used elements are found between index 0 and lngX - 1.

If you need more explanation I'll happily explain it, but by talking you through it. It's too complex to explain without checking you understand each step as we go. I'd be happy to call you, and I can now do it on Skype without the dreaded Caller Number Witheld message.

At the moment I have no test bed to work with, but if you'd like me to call then pick a time and I'll set one up beforehand to help with the illustrations.
Jan 18 '12 #16

Rabbit
Expert Mod 10K+
P: 12,315
Honestly, I've never had to get a random sample and I too sometimes wonder about the randomness.

I strongly suspect that you would have to derive a sufficiently large number based on time and a unique id and then use a mod to achieve better randomness.

P.S. By sufficiently large, I mean a number that would overflow the maximum seed, which could cause issues as far as Jet SQL is concerned. But perhaps a smaller mod could work as well. I don't know enough about the implementation of Rnd.
Jan 18 '12 #17

Rabbit
Expert Mod 10K+
P: 12,315
As for post #11, I prefer a join approach over a subquery in terms of speed.
Expand|Select|Wrap|Line Numbers
  1. SELECT [t1].SecondaryID, [t1].SubjectID
  2. FROM   (
  3.      SELECT * 
  4.           , Rnd(-Minute(Now()) * Second(Now()) * Asc([SecondaryID]) * Asc(Right([SubjectID], 1))) As RandNo 
  5.      FROM   [tblDATA]
  6. ) AS [t1]
  7. INNER JOIN (
  8.      SELECT * 
  9.           , Rnd(-Minute(Now()) * Second(Now()) * Asc([SecondaryID]) * Asc(Right([SubjectID], 1))) As RandNo 
  10.      FROM   [tblDATA]
  11. ) AS [t2]
  12. ON [t1].SecondaryID = [t2].SecondaryID
  13.    AND [t1].RandNo >= [t2].RandNo
  14. GROUP BY [t1].SecondaryID, [t1].SubjectID
  15. HAVING COUNT(*) <= 20
Jan 18 '12 #18

Rabbit
Expert Mod 10K+
P: 12,315
I'm currently running this test code
Expand|Select|Wrap|Line Numbers
  1. Option Compare Database
  2. Option Explicit
  3.  
  4. Private Type SYSTEMTIME
  5.     wYear As Integer
  6.     wMonth As Integer
  7.     wDayOfWeek As Integer
  8.     wDay As Integer
  9.     wHour As Integer
  10.     wMinute As Integer
  11.     wSecond As Integer
  12.     wMilliseconds As Integer
  13. End Type
  14.  
  15. Declare Sub Sleep Lib "kernel32" (ByVal dwMilliseconds As Long)
  16. Private Declare Sub GetSystemTime Lib "kernel32" (lpSystemTime As SYSTEMTIME)
  17.  
  18. Public Function TimeToMillisecond() As Integer
  19.     Dim tSystem As SYSTEMTIME
  20.     On Error Resume Next
  21.     GetSystemTime tSystem
  22.     TimeToMillisecond = tSystem.wMilliseconds
  23. End Function
  24.  
  25. Sub TestRandomness()
  26.     Dim runAvg As Double, i As Long, uniqueID As Long
  27.  
  28.     For uniqueID = 1 To 10
  29.         runAvg = 0
  30.  
  31.         For i = 1 To 100000
  32.             runAvg = runAvg + Rnd(-1 * TimeToMillisecond * uniqueID)
  33.             Sleep (1)
  34.         Next i
  35.  
  36.         Debug.Print uniqueID & " :: " & Round(runAvg / 100000, 4)
  37.     Next uniqueID
  38. End Sub
If it's random enough, the average for each uniqueID should approach 0.5.
Jan 18 '12 #19

ADezii
Expert 5K+
P: 8,597
I expanded my original Code in order to generate 20 Random/Unique Subjects per Group, but unfortunately it got a little too complex and unwieldy. I'll post it anyway for reference sake. It has been tested and is fully operational. I populated tblData with exactly 20 Subjects per Group to make sure that all the Subjects were Random and Unique for each Group. Results are also posted.
Expand|Select|Wrap|Line Numbers
  1. Dim MyDB As DAO.Database
  2. Dim rstUniqueGroups As DAO.Recordset
  3. Dim rstSubjectsForGroup As DAO.Recordset
  4. Dim strSQL1 As String
  5. Dim strSQL2 As String
  6. Dim intNumOfRecs As Integer     'Number of Subjects per Group
  7. Dim intCtr As Integer
  8. Dim intCtr2 As Integer
  9. Dim intMyRandom As Integer
  10. Dim RandomNumbers() As Integer
  11. Const conRND_SUBJECTS_PER_GROUP As Byte = 20    'Random Subjects/Group required
  12.  
  13. Randomize
  14.  
  15. Set MyDB = CurrentDb
  16.  
  17. ReDim RandomNumbers(1 To conRND_SUBJECTS_PER_GROUP)
  18.  
  19. 'Recordset representing Distinct Groups in tblData
  20. strSQL1 = "SELECT DISTINCT [SecondaryID] FROM tblData"
  21.  
  22. Set rstUniqueGroups = MyDB.OpenRecordset(strSQL1, dbOpenSnapshot, dbOpenForwardOnly)
  23.  
  24. With rstUniqueGroups
  25.   Do While Not .EOF
  26.     'Recordset representing Subjects for each Group in tblData
  27.     strSQL2 = "SELECT * FROM tblData WHERE [SecondaryID] = '" & ![SecondaryID] & "';"
  28.     Set rstSubjectsForGroup = MyDB.OpenRecordset(strSQL2, dbOpenSnapshot)
  29.       rstSubjectsForGroup.MoveLast: rstSubjectsForGroup.MoveFirst
  30.       intNumOfRecs = rstSubjectsForGroup.RecordCount
  31.         'Do we have enough Subjects for the Group? (20)
  32.         If intNumOfRecs < conRND_SUBJECTS_PER_GROUP Then
  33.           Debug.Print "Not enough Subjects for " & ![SecondaryID]
  34.         Else
  35.           For intCtr = 1 To conRND_SUBJECTS_PER_GROUP       'Array to hold 20 Randoms
  36.             intMyRandom = Int(Rnd() * intNumOfRecs)
  37.               RandomNumbers(intCtr) = intMyRandom
  38.           Next
  39.  
  40.           'RandomNumbers() now contains conRND_SUBJECTS_PER_GROUP Subjects (as Indices) per Group,
  41.           'but are these Randoms all Unique, The next Step will remove any Dups and Replace
  42. DoItAllOverAgain:
  43.           For intCtr = 1 To UBound(RandomNumbers)
  44.             For intCtr2 = 1 To UBound(RandomNumbers)
  45.               If intCtr <> intCtr2 Then
  46.                 If RandomNumbers(intCtr) = RandomNumbers(intCtr2) Then
  47.                   RandomNumbers(intCtr) = Int(Rnd() * intNumOfRecs)
  48.                     GoTo DoItAllOverAgain
  49.                 End If
  50.               End If
  51.             Next intCtr2
  52.           Next intCtr
  53.  
  54.           'RandomNumbers() now contains conRND_SUBJECTS_PER_GROUP Random, Unique Values which
  55.           'will be Indices into the Recordset
  56.           For intCtr = 1 To UBound(RandomNumbers)
  57.             rstSubjectsForGroup.Move RandomNumbers(intCtr)
  58.               Debug.Print ![SecondaryID] & " | " & "Subject: " & rstSubjectsForGroup![SubjectID]
  59.                 rstSubjectsForGroup.MoveFirst
  60.           Next
  61.           Debug.Print "**********************************"
  62.         End If
  63.     .MoveNext
  64.   Loop
  65. End With
  66.  
  67. rstUniqueGroups.Close
  68. rstSubjectsForGroup.Close
  69. Set rstUniqueGroups = Nothing
  70. Set rstSubjectsForGroup = Nothing
Expand|Select|Wrap|Line Numbers
  1. Group A | Subject: A12
  2. Group A | Subject: A7
  3. Group A | Subject: A2
  4. Group A | Subject: A15
  5. Group A | Subject: A8
  6. Group A | Subject: A13
  7. Group A | Subject: A3
  8. Group A | Subject: A20
  9. Group A | Subject: A9
  10. Group A | Subject: A14
  11. Group A | Subject: A16
  12. Group A | Subject: A6
  13. Group A | Subject: A1
  14. Group A | Subject: A19
  15. Group A | Subject: A17
  16. Group A | Subject: A10
  17. Group A | Subject: A5
  18. Group A | Subject: A4
  19. Group A | Subject: A18
  20. Group A | Subject: A11
  21. **********************************
  22. Group D | Subject: D6
  23. Group D | Subject: D9
  24. Group D | Subject: D4
  25. Group D | Subject: D18
  26. Group D | Subject: D16
  27. Group D | Subject: D10
  28. Group D | Subject: D13
  29. Group D | Subject: D19
  30. Group D | Subject: D17
  31. Group D | Subject: D12
  32. Group D | Subject: D1
  33. Group D | Subject: D8
  34. Group D | Subject: D5
  35. Group D | Subject: D14
  36. Group D | Subject: D2
  37. Group D | Subject: D11
  38. Group D | Subject: D7
  39. Group D | Subject: D20
  40. Group D | Subject: D15
  41. Group D | Subject: D3
  42. **********************************
  43. Group X | Subject: X18
  44. Group X | Subject: X11
  45. Group X | Subject: X7
  46. Group X | Subject: X19
  47. Group X | Subject: X10
  48. Group X | Subject: X12
  49. Group X | Subject: X4
  50. Group X | Subject: X8
  51. Group X | Subject: X6
  52. Group X | Subject: X1
  53. Group X | Subject: X20
  54. Group X | Subject: X17
  55. Group X | Subject: X2
  56. Group X | Subject: X9
  57. Group X | Subject: X5
  58. Group X | Subject: X14
  59. Group X | Subject: X3
  60. Group X | Subject: X16
  61. Group X | Subject: X15
  62. Group X | Subject: X13
  63. **********************************
Jan 18 '12 #20

Rabbit
Expert Mod 10K+
P: 12,315
Just got back the result for ID 1, 0.5122. So things seem to be off to a good start. Going to lunch now, I'll post the rest of the results when I get back.

PS. Oops, just saw a mistake in my code, rerunning.
Jan 18 '12 #21

lilp32
P: 43
@NeoPa: I realized my mistake, thanks! I still haven't gotten the VBA code to work.

I ran the code in Post #18 and I got a total of 208 random numbers from 14 out of 20 groups. The distribution is as follows (SecondaryID, Count of SubjectIDs selected):

1 15
2 14
3 18
6 20
7 11
8 12
9 12
11 19
12 16
13 11
15 13
17 20
20 13
21 14
Jan 18 '12 #22

Rabbit
Expert Mod 10K+
P: 12,315
Here are the results
Expand|Select|Wrap|Line Numbers
  1.  1 :: 0.4984
  2.  2 :: 0.5054
  3.  3 :: 0.5073
  4.  4 :: 0.5111
  5.  5 :: 0.4924
  6.  6 :: 0.5128
  7.  7 :: 0.5709
  8.  8 :: 0.5244
  9.  9 :: 0.4915
  10. 10 :: 0.5266
Aside from #7, the results seem encouraging.
Jan 18 '12 #23

Rabbit
Expert Mod 10K+
P: 12,315
@lilp32, I suspect this is an artifact of using SecondaryID and SubjectID to seed the random rather than a true unique ID. If possible, replace it with a unique ID field.
Jan 18 '12 #24

lilp32
P: 43
I can replace SubjectID with a unique ID field but SecondaryID is a group ID field. I replaced the SubjectID with a unique ID and got 220 numbers.
Jan 18 '12 #25

Rabbit
Expert Mod 10K+
P: 12,315
When I said unique ID, I meant unique for a record, not unique for the field.
Jan 18 '12 #26

lilp32
P: 43
Sorry, I'm not sure what that means.
Jan 18 '12 #27

Rabbit
Expert Mod 10K+
P: 12,315
Expand|Select|Wrap|Line Numbers
  1. ID Sec Subject
  2. 1  A   1
  3. 2  A   2
  4. 3  B   1
  5. 4  B   3
ID is unique for the record.
Jan 18 '12 #28

lilp32
P: 43
Thanks, I do have a unique ID but how do I use this to "seed the random"?
Jan 18 '12 #29

Rabbit
Expert Mod 10K+
P: 12,315
Is it numeric? It would have to be numeric. Note the difference in the code.
Expand|Select|Wrap|Line Numbers
  1. SELECT [t1].SecondaryID, [t1].SubjectID 
  2. FROM   ( 
  3.      SELECT *  
  4.           , Rnd(-Minute(Now()) * Second(Now()) * [UNIQUEID]) As RandNo  
  5.      FROM   [tblDATA] 
  6. ) AS [t1] 
  7. INNER JOIN ( 
  8.      SELECT *  
  9.           , Rnd(-Minute(Now()) * Second(Now()) * [UNIQUEID]) As RandNo  
  10.      FROM   [tblDATA] 
  11. ) AS [t2] 
  12. ON [t1].SecondaryID = [t2].SecondaryID 
  13.    AND [t1].RandNo >= [t2].RandNo 
  14. GROUP BY [t1].SecondaryID, [t1].SubjectID 
  15. HAVING COUNT(*) <= 20 
Jan 18 '12 #30

lilp32
P: 43
Works perfectly! Thanks so much!!!
Jan 18 '12 #31

Rabbit
Expert Mod 10K+
P: 12,315
Not a problem, good luck.
Jan 18 '12 #32

Rabbit
Expert Mod 10K+
P: 12,315
The results of 100000 runs each on 50 unique IDs. There are a few outliers but for the most part, it approaches random.
Expand|Select|Wrap|Line Numbers
  1.  1 :: 0.5097
  2.  2 :: 0.4913
  3.  3 :: 0.4957
  4.  4 :: 0.4857
  5.  5 :: 0.5064
  6.  6 :: 0.4436
  7.  7 :: 0.3428
  8.  8 :: 0.5
  9.  9 :: 0.52
  10. 10 :: 0.5037
  11. 11 :: 0.4366
  12. 12 :: 0.5158
  13. 13 :: 0.469
  14. 14 :: 0.4779
  15. 15 :: 0.49
  16. 16 :: 0.5013
  17. 17 :: 0.5315
  18. 18 :: 0.4691
  19. 19 :: 0.5175
  20. 20 :: 0.541
  21. 21 :: 0.4312
  22. 22 :: 0.421
  23. 23 :: 0.4748
  24. 24 :: 0.4604
  25. 25 :: 0.5207
  26. 26 :: 0.5059
  27. 27 :: 0.4522
  28. 28 :: 0.4169
  29. 29 :: 0.6054
  30. 30 :: 0.5955
  31. 31 :: 0.5092
  32. 32 :: 0.5134
  33. 33 :: 0.5379
  34. 34 :: 0.5026
  35. 35 :: 0.4889
  36. 36 :: 0.5094
  37. 37 :: 0.4758
  38. 38 :: 0.452
  39. 39 :: 0.5282
  40. 40 :: 0.4668
  41. 41 :: 0.4673
  42. 42 :: 0.4998
  43. 43 :: 0.4969
  44. 44 :: 0.4396
  45. 45 :: 0.5248
  46. 46 :: 0.5087
  47. 47 :: 0.5215
  48. 48 :: 0.4943
  49. 49 :: 0.4541
  50. 50 :: 0.4971
Jan 18 '12 #33

NeoPa
Expert Mod 15k+
P: 31,186
Rabbit (Post #18):
As for post #11, I prefer a join approach over a subquery in terms of speed.
I'm sure you do. I would too if there were a version that gave valid results. I don't believe that's the case with what you posted in post #18 though, for the reasons I mentioned earlier (post #11). It doesn't seem sensible to compromise the results for reasons of performance (Hence I went for the less efficient approach that, as far as I can see, gives reliable results logically).

Ref post #30.
Certainly it is better to use the [UNIQUEID] field in the seeding, but this doesn't. It multiplies by a [UNIQUEID] instead. This has the effect that the results may well look random, but are not. It is my belief (until I see something else new to be fair) that this approach cannot yield the results you request. Assume for the sake of understanding that [UNIQUEID] is allocated in order of records created. As the product of the result of Rnd() * [UNIQUENO] will always be a value which follows the pattern 0 <= X < [UNIQUENO], it follows that more recent records will tend towards those to be selected. Nothing is easily testable as the randomness is always involved too, but unless [UNIQUENO] is added as part of the seed this will always be flawed. Of course if that were the case, then this approach wouldn't work anyway.

I'm happy to be proven wrong here, but I'll require more than the fact that the results seem random (as I'd expect that anyway). That doesn't counter the knowledge that largere numbered [UNIQUEID] values (more recent records) would be weighted over those with lower values (entered earlier).
Jan 18 '12 #34

NeoPa
Expert Mod 15k+
P: 31,186
Lilp32:
@NeoPa: I realized my mistake, thanks! I still haven't gotten the VBA code to work.
Let me know what you're stuck on and I'll be happy to help.

PS. As per my response in post #34, Rabbit is absolutely right in that my suggested SQL code in post #11 would not run as fast as the SQL using JOINs (I believe it would actually work reliably, but I'm not above wanting confirmation on that score from Rabbit who is, as you'll see, one of the best SQL exponents you could hope to come across). As such, I'm not sure whether or not it would run faster or slower than the code solution from post #12 (which references the routine from post #10).
Jan 18 '12 #35

Rabbit
Expert Mod 10K+
P: 12,315
The only time that a join would return incorrect results is when Rnd() returns the same result for different records in the same group. At which point you would have to balance veracity against speed. And that would be highly dependent on how many records they have. I think at most you would get 19 records rather then your 20 and even that would be pretty a rare occurrence. If this is not the shortcoming you're referring to, can you explain?

I don't think I suggested that the records be picked by Rnd() * [UniqueID] as that would definitely result in incorrect results. Rather, I meant that Rnd(-1 * [seconds or milliseconds] * [UniqueID]) should be used to get the random number. Using that as the seed should result in near randomness for any [UniqueID]. I believe the code in 30 and the results in 33 show that, at least for the numbers 1-50, any one ID approaches randomness.

I suppose the next step would be to show randomness between any two sets of IDs. I can make test code for that as well. Because while randomness is, by definition, difficult to test, it is not impossible to test all possible seed values that may be used to seed the generator.

In the end, I believe the key to approaching randomness would be to multiply more and more time values into the seed.
Jan 19 '12 #36

NeoPa
Expert Mod 15k+
P: 31,186
Rabbit:
I don't think I suggested that the records be picked by Rnd() * [UniqueID] as that would definitely result in incorrect results.
Indeed. You are correct on that point. I checked over and over again and still managed to get confused on that point, but now I look yet again, I see it was, indeed, within the seed of the Rnd() call (Not only in the most recent example but also in posts #6 and #18).

My other objection was based on the fact that the [RandNo] values for the same records in [T1] and [T2] were identical (assuming Now() is a value that is only set once for the whole run and simply reused as that static value for each reference). I'm ashamed to say that on that score I was also mistaken (Either that or I'm just too tired to think clearly now). This is actually an essential part of the logic.

If I was impressed before, I find I now have the urge to bow before a SQL deity. I see nothing wrong with it at all (If I'm being really picky I might say to update line #14 to GROUP BY T1.UNIQEUEID, but that's just a detail you didn't catch up with after that field was provided late).

@Lilp32.
Go ahead and mark post #30 as Best Answer. I'll be happy to take you through my code still, if you want, but Rabbit's solution is, frankly, in a class of its own. I'm just checking again in case my enthusiasm is a result of tiredness and I'm missing something, but I really don't think so. It also seems more straightforward now somehow (which illustrates its code-elegance). No. I'm happy it's the doggy's doo-dahs.

Rabbit:
In the end, I believe the key to approaching randomness would be to multiply more and more time values into the seed.
No. I led you up the garden path there I'm afraid (inadvertently of course). The code is perfectly random I see now. Any perceived lack of randomness was down to my misunderstanding of where the parentheses left your multiplication of the value(s) from within the record.
Jan 19 '12 #37

Rabbit
Expert Mod 10K+
P: 12,315
I wouldn't say that you steered me wrong per se. Only that you sent me down an inevitable and necessary path to check whether or not it is truly random between any two numbers.

I plan on running the following simulation when I get back in the office tomorrow.
Expand|Select|Wrap|Line Numbers
  1. For seconds = 1 To 60
  2.    For minutes = seconds + 1 To 60
  3.       win = 0
  4.  
  5.       For id1 = 1 To 10001 Step 100
  6.          For id2 = id1 + 100 To 10001 Step 100
  7.             If rnd(minutes * seconds * id1) > rnd(minutes * seconds * id2) Then
  8.                win = win + 1
  9.             End If
  10.          Next id2
  11.       Next id1
  12.  
  13.       Debug.Print win / 3600
  14.    Next minutes
  15. Next seconds
The win rate for any two number combination should approach 50%.
Jan 19 '12 #38

Rabbit
Expert Mod 10K+
P: 12,315
I only went from 1 to 1001 with a step of 100. But here are the results:
Expand|Select|Wrap|Line Numbers
  1. 1,101,0.497777777777778
  2. 1,201,0.506388888888889
  3. 1,301,0.508888888888889
  4. 1,401,0.487222222222222
  5. 1,501,0.493333333333333
  6. 1,601,0.495277777777778
  7. 1,701,0.495
  8. 1,801,0.489444444444444
  9. 1,901,0.498055555555556
  10. 1,1001,0.494444444444444
  11. 101,201,0.501944444444444
  12. 101,301,0.505
  13. 101,401,0.509444444444444
  14. 101,501,0.5125
  15. 101,601,0.487222222222222
  16. 101,701,0.498055555555556
  17. 101,801,0.514722222222222
  18. 101,901,0.496111111111111
  19. 101,1001,0.498888888888889
  20. 201,301,0.510833333333333
  21. 201,401,0.501111111111111
  22. 201,501,0.500277777777778
  23. 201,601,0.500555555555556
  24. 201,701,0.496111111111111
  25. 201,801,0.500555555555556
  26. 201,901,0.491388888888889
  27. 201,1001,0.506388888888889
  28. 301,401,0.507777777777778
  29. 301,501,0.493055555555556
  30. 301,601,0.4975
  31. 301,701,0.4925
  32. 301,801,0.490555555555556
  33. 301,901,0.487777777777778
  34. 301,1001,0.499166666666667
  35. 401,501,0.506944444444444
  36. 401,601,0.488888888888889
  37. 401,701,0.494166666666667
  38. 401,801,0.492777777777778
  39. 401,901,0.495
  40. 401,1001,0.503055555555556
  41. 501,601,0.499166666666667
  42. 501,701,0.496666666666667
  43. 501,801,0.489166666666667
  44. 501,901,0.493333333333333
  45. 501,1001,0.495277777777778
  46. 601,701,0.495555555555556
  47. 601,801,0.498333333333333
  48. 601,901,0.496666666666667
  49. 601,1001,0.501666666666667
  50. 701,801,0.491666666666667
  51. 701,901,0.506111111111111
  52. 701,1001,0.504166666666667
  53. 801,901,0.512222222222222
  54. 801,1001,0.504166666666667
  55. 901,1001,0.503055555555556
For the most part, it seems to approach random. The largest difference seems to be a 2.4% bias towards one number over another.

P.S. As expected, multiplying more time factors into the seed reduces variance. Minutes and seconds is good but minutes, seconds, hours is better and so on.
Jan 19 '12 #39

NeoPa
Expert Mod 15k+
P: 31,186
Rabbit:
P.S. As expected, multiplying more time factors into the seed reduces variance. Minutes and seconds is good but minutes, seconds, hours is better and so on.
Indeed. I left out the Hours when I looked at my version as that would have risked an overflow condition. Two fields from the record were required at that time though. Now [UNIQUEID] is available the Hours can be handled without risk of overflow.
Jan 19 '12 #40

Post your reply

Sign in to post your reply or Sign up for a free account.