469,266 Members | 2,063 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,266 developers. It's quick & easy.

Last 3 records by group

How can I retrieve the last 3 records from the following table. The table has the following fields and sample data:

ClientID TransactionID TransactionDate

1 6511216 01/02/2012
1 5332573 18/05/2011
1 9849528 11/02/2012
1 5374530 24/08/2010
1 5711675 26/04/2009
1 4001184 01/09/2011
2 9087526 15/07/2011
2 6524824 06/08/2009
2 5376892 26/07/2011
2 5327891 28/10/2008
2 6423568 11/11/2011
2 5379827 16/06/2012

the results should be (order by TransactionDate):

ClientID TransactionID TransactionDate

1 4001184 01/09/2011
1 6511216 01/02/2012
1 9849528 11/02/2012
2 5376892 26/07/2011
2 6423568 11/11/2011
2 5379827 16/06/2012


Note: I can't use 'Select Top 3' a. because of the month format b. this must work for a table where TransactionDate is a different field with text data.
Feb 11 '12
69 7464
Rabbit
12,516 Expert Mod 8TB
I didn't think that it would be a large difference. The problem is that the 150,000 records are joined to itself. So it's practically comparing 150000 ^ 2. And that's a lot of records to process. That would be the largest contributing factor to the speed of and would far outweigh the overhead of VBA.

However, this may be because of a lack of proper indexes. Perhaps an index on the client and month would improve the speed of the SQL.

Now, if we were talking about SQL Server 2005+, it has a row numbering function that would significantly speed up the processing because you don't have to join the table to itself. I'm talking seconds rather than minutes.
Feb 14 '12 #51
ADezii
8,800 Expert 8TB
@Rabbit:
I also do not think that
testing the performance on 100 rows and then multiplying the result by 1500 (total of 150000)
is a reliable Benchmark. The Testing needs to be performed on the entire Data set of 150,000 Records. Comments?
Feb 14 '12 #52
Rabbit
12,516 Expert Mod 8TB
Hard to say, because the numbers seem to be in line with what I got. I tested the SQL on 170,000 rows and it took a little under 6 minutes.

A full test couldn't hurt, if it's only 6-7 minutes it would be beneficial to just do a full test.
Feb 14 '12 #53
Rabbit
12,516 Expert Mod 8TB
I whipped this up. It's a simplified replication of the ROW_NUMBER() function in SQL Server. It should be many magnitudes faster.

Expand|Select|Wrap|Line Numbers
  1. Option Compare Database
  2. Option Explicit
  3.  
  4. Dim row As Double
  5. Dim strPartition As String
  6.  
  7. Public Function RowNum(partition As String) As Double
  8.     If partition <> strPartition Then
  9.         row = 0
  10.         strPartition = partition
  11.     End If
  12.  
  13.     row = row + 1
  14.     RowNum = row
  15. End Function
Expand|Select|Wrap|Line Numbers
  1. SELECT partitionField, 
  2.    orderField,
  3.    otherField
  4. FROM someTable
  5. WHERE RowNumber(partitionField) < 4
  6. ORDER BY partitionField,
  7.    orderField
PS: It's not as quick as the SQL Server version but it takes half the time as the previous solutions. I ran it on 170000 records and it took a little under 3 minutes.
Feb 14 '12 #54
NeoPa
32,171 Expert Mod 16PB
Moishy101:
Then I get a Cannot group on fields selected with '*' (qTH1). error.
OK. That means Jet SQL needs it spelled out long-hand. Unfortunate, as I was trying to minimise unnecessary verbage :-(

Expand|Select|Wrap|Line Numbers
  1. SELECT   qTH1.ClientID
  2.        , qTH1.TransactionNumber
  3.        , qTH1.MonthOrder
  4. FROM     [qryTransactionHebrew] AS [qTH1]
  5.          INNER JOIN
  6.          [qryTransactionHebrew] AS [qTH2]
  7.   ON     (qTH1.ClientID = qTH2.ClientID)
  8.  AND     (qTH1.MonthOrder >= qTH2.MonthOrder)
  9. GROUP BY qTH1.ClientID
  10.        , qTH1.TransactionNumber
  11.        , qTH1.MonthOrder
  12. HAVING   Count(*) < 4
  13. ORDER BY qTH1.ClientID ASC
  14.        , qTH1.MonthOrder DESC
Essentially the same as Rabbit's version but with more meaningful names (Well, you couldn't expect him to read my mind now, could you).
Feb 15 '12 #55
NeoPa
32,171 Expert Mod 16PB
On the performance issue, it would be interesting (think critical) to know whether or not [ClientID] is indexed. I would normally expect the SQL only version to be significantly faster, all else being equal. I didn't study the VBA suggested closely enough to determine if it was implementing the same logic as the SQL. Nor can I honestly say what I mean by significantly faster. A little more than was shown to be sure, but not necessarily much more.

I believe the killer issues are :
  1. The number of records involved.
  2. The fact that the data of each record needs to be run through [qryTransactionHebrew] before it's sorted.

PS. Did the OP ever comment on the problem related to months of different years, raised in post #19?
Feb 15 '12 #56
Rabbit:
I'm not sure what your pseudo code means, can you please elaborate?

NeoPa:

Thank you very much, it works well.
Feb 15 '12 #57
Rabbit
12,516 Expert Mod 8TB
It wasn't pseudo code. Well, the SQL query won't run with your table set up but it's syntactically correct. It basically numbers the rows breaking by the grouping field and then uses that to filter the rows. It should run at least twice as fast as the other methods.

NeoPa, I don't think that they did.
Feb 15 '12 #58
NeoPa:

Regarding the problem related to months of different years, raised in post #19, there is another problem, the Hebrew calendar has 6 leap years (13 months instead of 12) in a 19 year cycle, so the order will not be accurate.
but both of those issues can be solved quite simply albeit not in a smart or professional manner, in tblHebrewMonthOrder I can add the values for the coming years (expected lifetime of the mdb).
Feb 15 '12 #59
Rabbit:

What table setup do I need for your new query to work?
Feb 15 '12 #60
Is it was possible to have a separate auto-numbered field for each ClientID, if so would grabbing the last three records for each ClientID (based upon the auto-numbered field) be any more effective?

If the answer the above in yes, how can I implement it?
Feb 15 '12 #61
Rabbit
12,516 Expert Mod 8TB
In response to post 60, you don't have to change your table setup, you just have to use different sql, mine was just a syntactically correct example of how to use the code.

In response to post 61, that's what my code does.
Feb 15 '12 #62
NeoPa
32,171 Expert Mod 16PB
Moishy101:
but both of those issues can be solved quite simply albeit not in a smart or professional manner, in tblHebrewMonthOrder I can add the values for the coming years (expected lifetime of the mdb).
That makes sense actually. Sometimes you just have to have a kludgy solution when the data you're dealing with doesn't fit anywhere neatly. Clearly the [MonthOrder] values will go beyond thirteen in that case ;-)

For Rabbit's SQL you need to treat the references as :
  1. [someTable] == [qryTransactionHebrew]
  2. [partitionField] == [ClientID]
  3. [orderField] == [MonthOrder]
  4. [otherField] == [TransactionNumber]

Thus, the SQL would look something like :
Expand|Select|Wrap|Line Numbers
  1. SELECT   [ClientID]
  2.        , [TransactionNumber]
  3.        , [MonthOrder]
  4. FROM     [tblTransactionHebrew]
  5. WHERE    (RowNumber([ClientID]) < 4)
  6. ORDER BY [ClientID] ASC
  7.        , [MonthOrder] DESC
Feb 15 '12 #63
NeoPa
32,171 Expert Mod 16PB
I'd be surprised if that works exactly as intended. Hear me out on this, because I may be off-base and need correcting.

The WHERE clause is applied to the incoming data. As such, the ORDER BY clause has not had a chance to effect the order. I would expect this to reflect the original order, rather than that applied depending on the Hebrew month (which, I expect, would explain why Rabbit's test threw up no flags).

Is it possible to use a HAVING clause in a query which has no GROUP BY clause and no aggregation functions?
In case it is, try the following :
Expand|Select|Wrap|Line Numbers
  1. SELECT   [ClientID]
  2.        , [TransactionNumber]
  3.        , [MonthOrder]
  4. FROM     [tblTransactionHebrew]
  5. HAVING   (RowNumber([ClientID]) < 4)
  6. ORDER BY [ClientID] ASC
  7.        , [MonthOrder] DESC
I'm happy to be enlightened either way :-)
Feb 15 '12 #64
Rabbit
12,516 Expert Mod 8TB
Yes, I was worried about that being the case. But after creating the code and testing, it actually does take the order by into account before running the code. Otherwise, I was prepared to use a subquery to do the sorting before filtering.
Feb 15 '12 #65
NeoPa:
For Rabbit's SQL you need to treat the references as :

1. [someTable] == [qryTransactionHebrew]
2. [partitionField] == [ClientID]
3. [orderField] == [MonthOrder]
4. [otherField] == [TransactionNumber]
[MonthOrder] is in a separate table so the sql will have to be adapted a little bit.

Unfortunately it is not possible to use a HAVING clause in a query which has no GROUP BY clause and no aggregation functions, and so I get a "HAVING clause (RowNumber([ClientID])<4) without grouping or aggregation." error.

The WHERE clause is applied to the incoming data. As such, the ORDER BY clause has not had a chance to effect the order. I would expect this to reflect the original order, rather than that applied depending on the Hebrew month (which, I expect, would explain why Rabbit's test threw up no flags).
The order is fine, the problem is that it shows all records instead of the last three only.
Feb 15 '12 #66
Rabbit
12,516 Expert Mod 8TB
It looks like I was wrong. The WHERE clause is using the original order in the data source. I was just tricked because I included it as an output field and that seemed to be numbering the rows correctly. And no method of subquerying would fix the issue.

The only workaround would be to write out to a temporary table the sorted recordset before running the query. But that may be more trouble than it's worth. It should still be faster but it just becomes more complicated.
Feb 15 '12 #67
NeoPa
32,171 Expert Mod 16PB
Moishy101:
[MonthOrder] is in a separate table so the sql will have to be adapted a little bit.
No. It may be in a separate table from the transactions, but the source for the SQL ([someTable]) is [qryTransactionHebrew], which contains that data already.

Rabbit:
The only workaround would be to write out to a temporary table the sorted recordset before running the query. But that may be more trouble than it's worth. It should still be faster but it just becomes more complicated.
Unfortunate, after all that thinking and working out :-(

An alternative might be to add a field to the table (You may as well do two actually) to handle the ordering, which get updated when, and only when, the report is required. In all other circumstances, they should be considered to contain no valid data. With the relative data updated I don't expect the processing for the report would take too long. The updating might take some care to avoid making it take too long though.
Feb 15 '12 #68
Thank you all for your help, I'm busy with several other projects right now, I hope to get back to this issue in a week or so.
Mar 2 '12 #69
NeoPa
32,171 Expert Mod 16PB
No worries. I'm sure we can look at this again when you're next ready.

BTW. I missed a point from an earlier post which warrants a reply :
Moishy101:
The order is fine, the problem is that it shows all records instead of the last three only.
This shows that you haven't understood the point I was making from post #64. You may want to read it again, but it's a technical reason for why selecting the last three in the correct order doesn't work as intended. There's no point in selecting the last three of a set of records which are not already ordered in the correct sequence as that would simply result in three random records.
Mar 2 '12 #70

Post your reply

Sign in to post your reply or Sign up for a free account.

Similar topics

1 post views Thread by Matt | last post: by
1 post views Thread by CARIGAR | last post: by
reply views Thread by zhoujie | last post: by
reply views Thread by suresh191 | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.