By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
431,805 Members | 1,273 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 431,805 IT Pros & Developers. It's quick & easy.

Select top N by group problem

P: 2
I have a problem that's driving me crazy. i have a table that is a series of things like this
Expand|Select|Wrap|Line Numbers
  1. (Unit type) (part description)  (count)
  2. IWU   High pressure flowmeter      3
  3. IWU   Transformer                  2
  4. IWU   Fender                       2
  5. GEN   Fender                       2
  6. GEN   Antifreeze                   2
  7. GEN   Connection                   1
  8. 60A   Relief valve                 1
  9. 60A   High Pressure flow           1
There's a lot more values but you get the idea. I want to get the 5 most used parts per unit. The problem is, when i use the Select top 5 code it keeps giving me more records than I want where the 5th one has the same number as the next few records. Example: It works when the top 5 are 6 5 4 3 2 and the next highest is 1 but doesn't work when the top 5 are 5 4 3 2 1 and the next highest is 1. I have it organized descending by group so I really don't need it to look at the numbers. I just need it to pick the top 5 rows for each group and I can't get it to do that. I'm desperate!
Expand|Select|Wrap|Line Numbers
  1. SELECT tblIncidentByPartStep4.UnitType,
  2.        tblIncidentByPartStep4.PartDescription,
  3.        tblIncidentByPartStep4.New,
  4.        tblIncidentByPartStep4.Count
  5.  
  6. FROM tblIncidentByPartStep4
  7.  
  8. WHERE (((tblIncidentByPartStep4.Count) In (
  9.          SELECT top 5 Count
  10.          FROM tblIncidentByPartStep4 AS Dupe
  11.          WHERE dupe.Unittype = tblincidentByPartStep4.UnitType
  12.          ORDER BY dupe.count desc )));
Sep 13 '08 #1
Share this Question
Share on Google+
6 Replies


FishVal
Expert 2.5K+
P: 2,653
Hello, linzeyd.

The problem is that the table has no primary key or any other unique field.
Add one and change WHERE clause in main query to check whether the unique field is within TOP 5 subset returned by the subquery.

Regards,
Fish
Sep 13 '08 #2

FishVal
Expert 2.5K+
P: 2,653
Actually, after a preliminary testing, appears that even with PK field Access db engine shows definite flaws in certain combinations of TOP predicate and ordering by a field having duplicates.
I think it relates to query execution order internal optimizations, because TOP predicate works well with already ordered subquery:
Expand|Select|Wrap|Line Numbers
  1. SELECT TOP ..  * FROM (SELECT * FROM <table> WHERE ... ORDER BY ...);
  2.  
Having played a little with it I would say that SQL/VBA hybride would be preferrable.

Names in example below are slightly different but I guess quite recognizible.
[keyID] is PK field - Autonumber(Long), but could be any unique field of any type.

Code module:
Expand|Select|Wrap|Line Numbers
  1. Public Function IsInTop(varPK As Variant, varUnitType As Variant, lngTopCount As Long) As Boolean
  2.  
  3.     Dim rs As New ADODB.Recordset
  4.  
  5.     If IsNull(varPK) Or IsNull(varUnitType) Then Exit Function
  6.  
  7.     With rs
  8.         .ActiveConnection = CurrentProject.Connection
  9.         .CursorType = adOpenForwardOnly
  10.         .LockType = adLockReadOnly
  11.         .Open "SELECT keyID FROM t WHERE txtUnitType='" & varUnitType & _
  12.             "' ORDER BY lngCount DESC;"
  13.         For i = 1 To lngTopCount
  14.             If !keyID = varPK Then IsInTop = True: GoTo ExitPoint
  15.             .MoveNext
  16.         Next i
  17. ExitPoint:
  18.         .Close
  19.     End With
  20.  
  21.     Set rs = Nothing
  22.  
  23. End Function
  24.  
Query:
Expand|Select|Wrap|Line Numbers
  1. SELECT t.*
  2. FROM t
  3. WHERE IsInTop(t.keyID, t.txtUnitType,5);
  4.  
Sep 13 '08 #3

NeoPa
Expert Mod 15k+
P: 31,470
Actually, this behaviour is by design. The subquery, in which the TOP predicate is used, will return all records where the TOP 5 found values match. It does not guarantee to return only five records.

Furthermore, even had it returned only 5 records (a level of random guessing would be needed to determine which to select as the SQL doesn't specify), the values would be the same list of values anyway. This means that your SQL would STILL show however many records match those top 5 values, regardless of how many are returned in the subquery.

To avoid this behaviour (not very straightforward I'm afraid), add another field (preferably something unique) to the ORDER BY clause and then include this also within the WHERE clause of the main SQL.

There's more than one way to do this, but I'll include an example to illustrate :
Expand|Select|Wrap|Line Numbers
  1. SELECT [UnitType],
  2.        [PartDescription],
  3.        [New],
  4.        [Count]
  5.  
  6. FROM tblIncidentByPartStep4
  7.  
  8. WHERE [Count] & [PartDescription] In(
  9.          SELECT TOP 5 [Count] & [PartDescription]
  10.          FROM tblIncidentByPartStep4 AS Dupe
  11.          WHERE Dupe.UnitType=tblincidentByPartStep4.UnitType
  12.          ORDER BY [Count] DESC,
  13.                   [PartDescription])
Sep 14 '08 #4

FishVal
Expert 2.5K+
P: 2,653
Actually, after a preliminary testing, appears that even with PK field Access db engine shows definite flaws in certain combinations of TOP predicate and ordering by a field having duplicates.
I think it relates to query execution order internal optimizations, because TOP predicate works well with already ordered subquery:
Expand|Select|Wrap|Line Numbers
  1. SELECT TOP ..  * FROM (SELECT * FROM <table> WHERE ... ORDER BY ...);
  2.  
;)

Actually that is well-known and even specified behavior.
It is not a bad idea to RTFM sometimes.

TOP n [PERCENT]
Returns a certain number of records that fall at the top or the bottom of a range specified by an ORDER BY clause. Suppose you want the names of the top 25 students from the class of 1994:
Expand|Select|Wrap|Line Numbers
  1. SELECT TOP 25
  2. FirstName, LastName
  3. FROM Students
  4. WHERE GraduationYear = 1994
  5. ORDER BY GradePointAverage DESC;
  6.  
If you do not include the ORDER BY clause, the query will return an arbitrary set of 25 records from the Students table that satisfy the WHERE clause.

The TOP predicate does not choose between equal values. In the preceding example, if the twenty-fifth and twenty-sixth highest grade point averages are the same, the query will return 26 records.
Regards,
Fish
Sep 15 '08 #5

NeoPa
Expert Mod 15k+
P: 31,470
;)

Actually that is well-known and even specified behavior.
It is not a bad idea to RTFM sometimes.

Regards,
Fish
Just to make this clear - Fish was talking to himself when he said this.

@Linzey, this was not directed at you at all.

@Fish, we can't know all things all the time. We get rusty on things sometimes. I know I did (and do).
Sep 15 '08 #6

FishVal
Expert 2.5K+
P: 2,653
Just to make this clear - Fish was talking to himself when he said this.
Just to make this absolutely clear - fishes don't talk at all. :P
Especially with human beings.
Sep 15 '08 #7

Post your reply

Sign in to post your reply or Sign up for a free account.