473,394 Members | 1,703 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,394 software developers and data experts.

Generating a query of top n by sum of a variable

I've got data in the following format:

Expand|Select|Wrap|Line Numbers
  1. Key  Group Product   Cost
  2. 1      1    itema   97.62
  3. 2      1    itema  139.12
  4. 3      1    itema  105.65
  5. 4      1    itema  130.09
  6. 5      1    itemb   88.95
  7. 6      1    itemb  124.14
  8. 7      1    itemc   90.78
  9. 8      1    itemc  111.21
  10. 9      1    itemc   29.70
  11. 10     1    itemc   86.56
  12. 11     2    itema  134.51
  13. 12     2    itema  132.92
  14. 13     2    itema   95.85
  15. 14     2    itema  110.70
  16. 15     2    itemb   70.24
  17. 16     2    itemb  104.59
  18. 17     2    itemc   83.69
  19. 18     2    itemc  120.05
  20. 19     2    itemc  104.20
  21. 20     2    itemc  101.09
  22. 21     3    itema   90.74
  23. 22     3    itema  127.02
  24. 23     3    itema   96.08
  25. 24     3    itema  123.52
  26. 25     3    itemb  129.32
  27. 26     3    itemb   85.67
  28. 27     3    itemc  114.09
  29. 28     3    itemc   72.28
  30. 29     3    itemc   60.02
  31. 30     3    itemc   91.68
...and I need to build a query that returns the top 1 (or n) products by sum of costs by group. I have tried a simple select top 1 (or n) query with grouping and sorting:

Expand|Select|Wrap|Line Numbers
  1. SELECT TOP 1 Testdf.group, Testdf.product, Sum(Testdf.cost) AS SumOfcost
  2. FROM Testdf
  3. GROUP BY Testdf.group, Testdf.product
  4. ORDER BY Sum(Testdf.cost) DESC;
...but it only returns one value (the single highest sum of cost). What I need is a query that returns 1 (or n) rows per group that contains the prouct with the highest sum of cost, grouped by group (e.g., 1, 2...). While it may seem trivial with the dataset I provided, however, the dataset I'm using has millions of records... HELP! :)
Sep 26 '11 #1
16 3706
NeoPa
32,556 Expert Mod 16PB
The TOP predicate of the SELECT clause pertains to the query as a whole and is nowhere defined as working within each group. IE. What you are returning with that SQL is exactly what would be expected - One record (or more if there were records with equal values).

You could try out the following SQL and see if it works for you for TOP n, otherwise

Expand|Select|Wrap|Line Numbers
  1. SELECT sub1.Group
  2.      , sub2.Product
  3.      , sub2.SumOfCost
  4. FROM   (SELECT DISTINCT [Group]
  5.         FROM   [TestDF]
  6.        ) AS sub1
  7.        INNER JOIN
  8.        (SELECT TOP 2
  9.                  [Group]
  10.                , [Product]
  11.                , Sum([Cost]) AS [SumOfCost]
  12.         FROM     [TestDF] AS TDF
  13.         WHERE    TDF.Group=sub1.Group
  14.         GROUP BY TDF.Group
  15.                , TDF.Product
  16.         ORDER BY [SumOfCost] DESC
  17.        ) AS sub2
  18.   ON   sub1.Group=sub2.Group
I've used TOP 2 simply as an illustration so, if this works, you can replace it with the value of your choice. I'm not in a good position to test it out though. Let me know how it goes when you do and we can take it from there.
Sep 26 '11 #2
Rabbit
12,516 Expert Mod 8TB
I don't think you can refer to an outer query from a subquery in the FROM clause. However, you should be able to move that logic within the WHERE clause and it should work just as well.

Having said that, another method is to join the table to itself and use aggregates to achieve a similar result.
Expand|Select|Wrap|Line Numbers
  1. SELECT T1.Category, Max(T1.Price)
  2. FROM tableName AS T1
  3. INNER JOIN tableName AS T2
  4. ON T1.Category = T2.Category
  5.    AND T1.Price <= T2.Price
  6.    AND T1.PK <= T2.PK
  7. GROUP BY T1.Category
  8. HAVING Count(*) = 1
Sep 26 '11 #3
NeoPa
32,556 Expert Mod 16PB
I suspect you may be right Rabbit, but you use of different field names makes it very hard for me to follow what you're trying to illustrate. Even if I change the names I can't see where it matches, but how much is wrong with the SQL and how much is my inability to follow it is not clear.
Sep 26 '11 #4
ADezii
8,834 Expert 8TB
The SQL Guys are going to kill me, but if you wish to take the simpler approach, simply Pass the [Group] and [Product] Values in a GROUP BY Query to a Public Function that does all the work. The down side is that Processing Time will probably be much greater that its SQL counterpart. You decide.
Expand|Select|Wrap|Line Numbers
  1. SELECT TestDF.Group, TestDF.Product, fCalcMAXCostPerGroup([Group],[Product]) AS MAX_Cost_Per_Group
  2. FROM TestDF
  3. GROUP BY TestDF.Group, TestDF.Product, fCalcMAXCostPerGroup([Group],[Product])
  4. ORDER BY TestDF.Group, TestDF.Product;
Expand|Select|Wrap|Line Numbers
  1. Public Function fCalcMAXCostPerGroup(lngGroup As Long, strProduct As String) As Double
  2.   fCalcMAXCostPerGroup = DMax("[Cost]", "TestDF", "[Group] = " & lngGroup & _
  3.                          " AND [Product] = '" & strProduct & "'")
  4. End Function
Query OUTPUT based on Data supplied in Post #1:
Expand|Select|Wrap|Line Numbers
  1. Group    Product    MAX_Cost_Per_Group
  2. 1        itema      139.12
  3. 1        itemb      124.14
  4. 1        itemc      111.21
  5. 2        itema      134.51
  6. 2        itemb      104.59
  7. 2        itemc      120.05
  8. 3        itema      127.02
  9. 3        itemb      129.32
  10. 3        itemc      114.09
Sep 26 '11 #5
Rabbit
12,516 Expert Mod 8TB
I like to change the names and use a more simplified example so that the OP has to put in some work.

But if I've confused you, then the OP will have a heck of a time.

Given the following data
Expand|Select|Wrap|Line Numbers
  1. PK Category Price
  2. 1  A        5
  3. 2  A        7
  4. 3  A        2
  5. 4  B        10
  6. 5  B        10
  7.  
The SQL join will result in
Expand|Select|Wrap|Line Numbers
  1. T1.PK T1.Category T1.Price T2.PK
  2. 1     A           5        1
  3. 1     A           5        2
  4. 2     A           7        2
  5. 3     A           2        1
  6. 3     A           2        2
  7. 3     A           2        3
  8. 4     B           10       4
  9. 4     B           10       5
  10. 5     B           10       5
  11.  
At this point, having written out this much out, I realize now that the SQL group by I posted is wrong. It should be grouped by the PK.

Anywho, those PK that have a count of 1 is as follows
Expand|Select|Wrap|Line Numbers
  1. PK Category Price
  2. 2  A        7
  3. 5  B        10
  4.  
Sep 26 '11 #6
NeoPa
32,556 Expert Mod 16PB
My main confusion was not so much that I was unable to cross-reference what you posted with the question at hand, as much as that I suspected there was no appropriate match. It didn't seem to me that your suggestion could be used in such a scenario. Unfortunately, to be sure I would have to reverse engineer your thinking, which I wasn't too keen to do in the circumstances. I appreciate your ongoing attempts to avoid simply providing solutions for the OP, but a side-effect is that it's extra difficult to review what you post for appropriateness.

As I say, I'm not convinced you have a workable solution, although I am convinced you're right when you say mine wouldn't work, but I don't want to try to check it out in encrypted form. As they say - Life's complicated enough.
Sep 26 '11 #7
Rabbit
12,516 Expert Mod 8TB
To put it into the terms of the OP's problem, it would be this.
Expand|Select|Wrap|Line Numbers
  1. SELECT T1.Group, T1.Product, MAX(T1.SumOfCost) AS MaxSumOfCost
  2. FROM (
  3. SELECT Group,
  4.    Product,
  5.    SUM(Cost) AS SumOfCost
  6. FROM Testdf
  7. GROUP BY Group,
  8.    Product
  9. ) AS T1
  10. INNER JOIN (
  11. SELECT Group,
  12.    Product,
  13.    SUM(Cost) AS SumOfCost
  14. FROM Testdf
  15. GROUP BY Group,
  16.    Product
  17. ) AS T2
  18. ON T1.Group = T2.Group
  19.     AND T1.SumOfCost <= T2.SumOfCost
  20.     AND T1.Product <= T2.Product
  21. GROUP BY Group, Product
  22. HAVING Count(*) = 1
Sep 27 '11 #8
Thanks for the consideration of my plight. I follow the creation of T1 and T2 queries that aggregate the data, but then my ability to follow gets hazy. I will try to work through your suggestions. I am curious, however, as to how the inner join works here. Rabbit, please explain the use of the <= operators and the (*) in the HAVING clause(?). And the initial SELECT statement, I assume I would need to be more specific in calling out the origial Testdf (Testdf.group, Testdf.product...)?
Sep 27 '11 #9
Rabbit
12,516 Expert Mod 8TB
I made an adjustment to the original query to qualify the columns in the select clause. There's no need to create multiple queries. That's just one query. The less than or equal to operator is to find the maximum cost per group. The count(*) is to get just the first such match.
Sep 27 '11 #10
NeoPa
32,556 Expert Mod 16PB
I tried this out as I still wasn't sure exactly what was going on. I had some understanding of why certain things were included but didn't get the feeling they were used quite as intended (I see no logic in comparing the [Product] field on a <= basis for instance). I had to qualify the Group & Product fields in the last GROUP BY clause (very minor mod) to get it to work, but these were the results I got (I don't think I materially changed your SQL but let me know if I have done so inadvertently anywhere) :

Expand|Select|Wrap|Line Numbers
  1. SELECT   T1.Group
  2.        , T1.Product
  3.        , MAX(T1.SumOfCost) AS MaxSumOfCost
  4. FROM     (SELECT   Group
  5.                  , Product
  6.                  , SUM(Cost) AS SumOfCost
  7.           FROM     Testdf
  8.           GROUP BY Group
  9.                  , Product
  10.          ) AS T1
  11.          INNER JOIN
  12.          (SELECT   Group
  13.                  , Product
  14.                  , SUM(Cost) AS SumOfCost
  15.           FROM     Testdf
  16.           GROUP BY Group
  17.                  , Product
  18.          ) AS T2
  19.   ON     (T1.Group = T2.Group)
  20.  AND     (T1.Product <= T2.Product)
  21.  AND     (T1.SumOfCost <= T2.SumOfCost)
  22. GROUP BY T1.Group
  23.        , T1.Product
  24. HAVING   Count(*) = 1
Expand|Select|Wrap|Line Numbers
  1. Group   Product MaxSumOfCost
  2. 1       itema   £472.48
  3. 1       itemc   £318.25
  4. 2       itema   £473.98
  5. 2       itemc   £409.03
  6. 3       itema   £437.36
  7. 3       itemc   £338.07
I'm assuming that, for the want of any idea to actually handle a TOP n, we're going straight to a TOP 1 (IE. Max).

I tried the following SQL, which I think should work logically, but it keeps prompting me for T1.Group :
Expand|Select|Wrap|Line Numbers
  1. SELECT   T1.Group
  2.        , T1.Product
  3.        , T1.SumOfCost
  4. FROM     (SELECT   [Group]
  5.                  , [Product]
  6.                  , Sum([Cost]) AS [SumOfCost]
  7.           FROM     [TestDF]
  8.           GROUP BY [Group]
  9.                  , [Product]
  10.          ) AS [T1]
  11. WHERE    T1.SumOfCost = (SELECT   Max([SumOfCost]) AS [MaxSumOfCost]
  12.                          FROM     (SELECT   [Product]
  13.                                           , Sum([Cost]) AS [SumOfCost]
  14.                                    FROM     [TestDF] AS [T2]
  15.                                    WHERE    (T2.Group = T1.Group)
  16.                                    GROUP BY [Product]
  17.                                   ) AS [T3]
  18.                         )
Frankly, most of the techniques I've used here I've picked up from Rabbit anyway, but I'm disappointed I was uanable to get this to work as expected in the time available (I have to go now so I'm calling a halt on it for now).
Sep 27 '11 #11
Rabbit
12,516 Expert Mod 8TB
I'll have to create a test version to see what's going on. I'll post back once I do so. The reason for the product comparison is for the off chance that the sum of two different products within the same group ends up being the same.
Sep 27 '11 #12
NeoPa
32,556 Expert Mod 16PB
Rabbit:
The reason for the product comparison is for the off chance that the sum of two different products within the same group ends up being the same.
Would that not require a simple equals comparison instead then?
Sep 27 '11 #13
Rabbit
12,516 Expert Mod 8TB
Here's the final SQL. I just had to tweak the join condition a little bit.
Expand|Select|Wrap|Line Numbers
  1. SELECT T1.[Group], T1.Product, MAX(T1.SumOfCost) AS MaxSumOfCost 
  2. FROM ( 
  3.     SELECT [Group], 
  4.        Product, 
  5.        SUM(Cost) AS SumOfCost 
  6.     FROM Testdf 
  7.     GROUP BY [Group], 
  8.        Product 
  9. ) T1 
  10. INNER JOIN ( 
  11.     SELECT [Group], 
  12.        Product, 
  13.        SUM(Cost) AS SumOfCost 
  14.     FROM Testdf 
  15.     GROUP BY [Group], 
  16.        Product 
  17. ) T2 
  18. ON T1.[Group] = T2.[Group] 
  19.     AND (T1.SumOfCost < T2.SumOfCost
  20.     OR (T1.SumOfCost = T2.SumOfCost 
  21.     AND T1.Product <= T2.Product))
  22. GROUP BY T1.[Group], T1.Product 
  23. HAVING Count(*) = 1
Sep 27 '11 #14
NeoPa
32,556 Expert Mod 16PB
I think I see it now Rabbit. It makes sense to compare the [Product]s as well as the sums on the basis that, even though the OP didn't specify how to handle values which were equal, it illustrates how it could be done if required.

The other point to make of course, is that the HAVING Count(*) = 1 line is really HAVING Count(*) = n as it would work for any TOP n requirement. A very sweet solution I must say, even if I confess to being a little disappointed a more straightforward one wasn't available even for the simpler (less sophisticated) Max() requirement.
Sep 30 '11 #15
Rabbit
12,516 Expert Mod 8TB
Well, I think a simpler, or rather easier to understand, solution is to use a subquery in the where clause. But I've found that this solution, for very large numbers of records, runs faster as it doesn't have to run a subquery for every row.
Sep 30 '11 #16
NeoPa
32,556 Expert Mod 16PB
Indeed. As I tried in my post #11, but which I couldn't get to work.
Sep 30 '11 #17

Sign in to post your reply or Sign up for a free account.

Similar topics

1
by: Randy K | last post by:
I have a table with some 35000 records and I need some help sorting it out. The goal is to get counts of failures modes oraganized by serial number. the table is set up roughly like this. s/n ...
41
by: Miguel Dias Moura | last post by:
Hello, I am working on an ASP.NET / VB page and I created a variable "query": Sub Page_Load(sender As Object, e As System.EventArgs) Dim query as String = String.Empty ... query =...
6
by: jjturon | last post by:
Can anyone help me?? I am trying to pass a Select Query variable to a table using Dlookup and return the value to same select query but to another field. Ex. SalesManID ...
4
by: reshmaah | last post by:
hi iam interested in generating a variable value by autoincrementing, in CGI using c as user enters some data, i want to generate an index for it, and store in a binary file with data, as...
2
polymorphic
by: polymorphic | last post by:
I am no longer good at Javascript and need help. I'm trying to embed pdf files in html then build some sort of navigation between the pdfs via the pdf numbered file name. I can generate the...
4
by: rendanime | last post by:
I need your help, i have designed a system using PHP. Now i want to create a report for projects. Currently the system can genarate report for all available project ( i manage to do this because is...
17
by: sharsy | last post by:
Hello guys, I would like some help in generating query criteria that will identify credit cards that have expired on an access database. The specific Field is formatted with a Data Type of...
2
by: I Hate My Computer | last post by:
I am using frames on a website. The title link on the title page adds a query string. The link goes to a page with two rows the second has two columns. I want the right column to be changed depending...
2
by: Looch | last post by:
All, I'm trying to output but I can only get (brackets for clarity) when using the code below. How can I "break" into the query variable in the InsertName method to add the name parameter to...
4
by: sonu2die4 | last post by:
Hi, I have the following perl script working for me.I am accesing the database from my perl script using use Net::Telnet(); package. I am not using DBI package.as I stated earlier the following...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.