Generating a query of top n by sum of a variable

I've got data in the following format:

Expand|Select|Wrap|Line Numbers

 Key  Group Product   Cost

1      1    itema   97.62

2      1    itema  139.12

3      1    itema  105.65

4      1    itema  130.09

5      1    itemb   88.95

6      1    itemb  124.14

7      1    itemc   90.78

8      1    itemc  111.21

9      1    itemc   29.70

10     1    itemc   86.56

11     2    itema  134.51

12     2    itema  132.92

13     2    itema   95.85

14     2    itema  110.70

15     2    itemb   70.24

16     2    itemb  104.59

17     2    itemc   83.69

18     2    itemc  120.05

19     2    itemc  104.20

20     2    itemc  101.09

21     3    itema   90.74

22     3    itema  127.02

23     3    itema   96.08

24     3    itema  123.52

25     3    itemb  129.32

26     3    itemb   85.67

27     3    itemc  114.09

28     3    itemc   72.28

29     3    itemc   60.02

30     3    itemc   91.68

...and I need to build a query that returns the top 1 (or n) products by sum of costs by group. I have tried a simple select top 1 (or n) query with grouping and sorting:

Expand|Select|Wrap|Line Numbers

 SELECT TOP 1 Testdf.group, Testdf.product, Sum(Testdf.cost) AS SumOfcost

FROM Testdf

GROUP BY Testdf.group, Testdf.product

ORDER BY Sum(Testdf.cost) DESC;

...but it only returns one value (the single highest sum of cost). What I need is a query that returns 1 (or n) rows per group that contains the prouct with the highest sum of cost, grouped by group (e.g., 1, 2...). While it may seem trivial with the dataset I provided, however, the dataset I'm using has millions of records... HELP! :)

Sep 26 '11 #1

Subscribe Post Reply

3706

NeoPa

32,556

Expert Mod 16PB

The TOP predicate of the SELECT clause pertains to the query as a whole and is nowhere defined as working within each group. IE. What you are returning with that SQL is exactly what would be expected - One record (or more if there were records with equal values).

You could try out the following SQL and see if it works for you for TOP n, otherwise

Expand|Select|Wrap|Line Numbers

 SELECT sub1.Group

     , sub2.Product

     , sub2.SumOfCost

FROM   (SELECT DISTINCT [Group]

        FROM   [TestDF]

       ) AS sub1

       INNER JOIN

       (SELECT TOP 2

                 [Group]

               , [Product]

               , Sum([Cost]) AS [SumOfCost]

        FROM     [TestDF] AS TDF

        WHERE    TDF.Group=sub1.Group

        GROUP BY TDF.Group

               , TDF.Product

        ORDER BY [SumOfCost] DESC

       ) AS sub2

  ON   sub1.Group=sub2.Group

I've used TOP 2 simply as an illustration so, if this works, you can replace it with the value of your choice. I'm not in a good position to test it out though. Let me know how it goes when you do and we can take it from there.

Sep 26 '11 #2

Rabbit

12,516

Expert Mod 8TB

I don't think you can refer to an outer query from a subquery in the FROM clause. However, you should be able to move that logic within the WHERE clause and it should work just as well.

Having said that, another method is to join the table to itself and use aggregates to achieve a similar result.

Expand|Select|Wrap|Line Numbers

 SELECT T1.Category, Max(T1.Price)

FROM tableName AS T1

INNER JOIN tableName AS T2

ON T1.Category = T2.Category

   AND T1.Price <= T2.Price

   AND T1.PK <= T2.PK

GROUP BY T1.Category

HAVING Count(*) = 1

Sep 26 '11 #3

NeoPa

32,556

Expert Mod 16PB

I suspect you may be right Rabbit, but you use of different field names makes it very hard for me to follow what you're trying to illustrate. Even if I change the names I can't see where it matches, but how much is wrong with the SQL and how much is my inability to follow it is not clear.

Sep 26 '11 #4

ADezii

8,834

Expert 8TB

The SQL Guys are going to kill me, but if you wish to take the simpler approach, simply Pass the [Group] and [Product] Values in a GROUP BY Query to a Public Function that does all the work. The down side is that Processing Time will probably be much greater that its SQL counterpart. You decide.

Expand|Select|Wrap|Line Numbers

 SELECT TestDF.Group, TestDF.Product, fCalcMAXCostPerGroup([Group],[Product]) AS MAX_Cost_Per_Group

FROM TestDF

GROUP BY TestDF.Group, TestDF.Product, fCalcMAXCostPerGroup([Group],[Product])

ORDER BY TestDF.Group, TestDF.Product;

Expand|Select|Wrap|Line Numbers

 Public Function fCalcMAXCostPerGroup(lngGroup As Long, strProduct As String) As Double

  fCalcMAXCostPerGroup = DMax("[Cost]", "TestDF", "[Group] = " & lngGroup & _

                         " AND [Product] = '" & strProduct & "'")

End Function

Query OUTPUT based on Data supplied in Post #1:

Expand|Select|Wrap|Line Numbers

 Group    Product    MAX_Cost_Per_Group

1        itema      139.12

1        itemb      124.14

1        itemc      111.21

2        itema      134.51

2        itemb      104.59

2        itemc      120.05

3        itema      127.02

3        itemb      129.32

3        itemc      114.09

Sep 26 '11 #5

Rabbit

12,516

Expert Mod 8TB

I like to change the names and use a more simplified example so that the OP has to put in some work.

But if I've confused you, then the OP will have a heck of a time.

Given the following data

Expand|Select|Wrap|Line Numbers

 
PK Category Price

1  A        5

2  A        7

3  A        2

4  B        10

5  B        10

The SQL join will result in

Expand|Select|Wrap|Line Numbers

 
T1.PK T1.Category T1.Price T2.PK

1     A           5        1

1     A           5        2

2     A           7        2

3     A           2        1

3     A           2        2

3     A           2        3

4     B           10       4

4     B           10       5

5     B           10       5

At this point, having written out this much out, I realize now that the SQL group by I posted is wrong. It should be grouped by the PK.

Anywho, those PK that have a count of 1 is as follows

Expand|Select|Wrap|Line Numbers

 
PK Category Price

2  A        7

5  B        10

Sep 26 '11 #6

NeoPa

32,556

Expert Mod 16PB

My main confusion was not so much that I was unable to cross-reference what you posted with the question at hand, as much as that I suspected there was no appropriate match. It didn't seem to me that your suggestion could be used in such a scenario. Unfortunately, to be sure I would have to reverse engineer your thinking, which I wasn't too keen to do in the circumstances. I appreciate your ongoing attempts to avoid simply providing solutions for the OP, but a side-effect is that it's extra difficult to review what you post for appropriateness.

As I say, I'm not convinced you have a workable solution, although I am convinced you're right when you say mine wouldn't work, but I don't want to try to check it out in encrypted form. As they say - Life's complicated enough.

Sep 26 '11 #7

Rabbit

12,516

Expert Mod 8TB

To put it into the terms of the OP's problem, it would be this.

Expand|Select|Wrap|Line Numbers

 
SELECT T1.Group, T1.Product, MAX(T1.SumOfCost) AS MaxSumOfCost

FROM (

SELECT Group,

   Product,

   SUM(Cost) AS SumOfCost

FROM Testdf

GROUP BY Group,

   Product

) AS T1

INNER JOIN (

SELECT Group,

   Product,

   SUM(Cost) AS SumOfCost

FROM Testdf

GROUP BY Group,

   Product

) AS T2

ON T1.Group = T2.Group

    AND T1.SumOfCost <= T2.SumOfCost

    AND T1.Product <= T2.Product

GROUP BY Group, Product

HAVING Count(*) = 1

Sep 27 '11 #8

connerphd

Thanks for the consideration of my plight. I follow the creation of T1 and T2 queries that aggregate the data, but then my ability to follow gets hazy. I will try to work through your suggestions. I am curious, however, as to how the inner join works here. Rabbit, please explain the use of the <= operators and the (*) in the HAVING clause(?). And the initial SELECT statement, I assume I would need to be more specific in calling out the origial Testdf (Testdf.group, Testdf.product...)?

Sep 27 '11 #9

Rabbit

12,516

Expert Mod 8TB

I made an adjustment to the original query to qualify the columns in the select clause. There's no need to create multiple queries. That's just one query. The less than or equal to operator is to find the maximum cost per group. The count(*) is to get just the first such match.

Sep 27 '11 #10

NeoPa

32,556

Expert Mod 16PB

I tried this out as I still wasn't sure exactly what was going on. I had some understanding of why certain things were included but didn't get the feeling they were used quite as intended (I see no logic in comparing the [Product] field on a <= basis for instance). I had to qualify the Group & Product fields in the last GROUP BY clause (very minor mod) to get it to work, but these were the results I got (I don't think I materially changed your SQL but let me know if I have done so inadvertently anywhere) :

Expand|Select|Wrap|Line Numbers

 SELECT   T1.Group

       , T1.Product

       , MAX(T1.SumOfCost) AS MaxSumOfCost

FROM     (SELECT   Group

                 , Product

                 , SUM(Cost) AS SumOfCost

          FROM     Testdf

          GROUP BY Group

                 , Product

         ) AS T1

         INNER JOIN

         (SELECT   Group

                 , Product

                 , SUM(Cost) AS SumOfCost

          FROM     Testdf

          GROUP BY Group

                 , Product

         ) AS T2

  ON     (T1.Group = T2.Group)

 AND     (T1.Product <= T2.Product)

 AND     (T1.SumOfCost <= T2.SumOfCost)

GROUP BY T1.Group

       , T1.Product

HAVING   Count(*) = 1

Expand|Select|Wrap|Line Numbers

 Group   Product MaxSumOfCost

1       itema   £472.48

1       itemc   £318.25

2       itema   £473.98

2       itemc   £409.03

3       itema   £437.36

3       itemc   £338.07

I'm assuming that, for the want of any idea to actually handle a TOP n, we're going straight to a TOP 1 (IE. Max).

I tried the following SQL, which I think should work logically, but it keeps prompting me for T1.Group :

Expand|Select|Wrap|Line Numbers

 SELECT   T1.Group

       , T1.Product

       , T1.SumOfCost

FROM     (SELECT   [Group]

                 , [Product]

                 , Sum([Cost]) AS [SumOfCost]

          FROM     [TestDF]

          GROUP BY [Group]

                 , [Product]

         ) AS [T1]

WHERE    T1.SumOfCost = (SELECT   Max([SumOfCost]) AS [MaxSumOfCost]

                         FROM     (SELECT   [Product]

                                          , Sum([Cost]) AS [SumOfCost]

                                   FROM     [TestDF] AS [T2]

                                   WHERE    (T2.Group = T1.Group)

                                   GROUP BY [Product]

                                  ) AS [T3]

                        )

Frankly, most of the techniques I've used here I've picked up from Rabbit anyway, but I'm disappointed I was uanable to get this to work as expected in the time available (I have to go now so I'm calling a halt on it for now).

Sep 27 '11 #11

Rabbit

12,516

Expert Mod 8TB

I'll have to create a test version to see what's going on. I'll post back once I do so. The reason for the product comparison is for the off chance that the sum of two different products within the same group ends up being the same.

Sep 27 '11 #12

NeoPa

32,556

Expert Mod 16PB

Rabbit:
The reason for the product comparison is for the off chance that the sum of two different products within the same group ends up being the same.

Would that not require a simple equals comparison instead then?

Sep 27 '11 #13

Rabbit

12,516

Expert Mod 8TB

Here's the final SQL. I just had to tweak the join condition a little bit.

Expand|Select|Wrap|Line Numbers

 SELECT T1.[Group], T1.Product, MAX(T1.SumOfCost) AS MaxSumOfCost 

FROM ( 

    SELECT [Group], 

       Product, 

       SUM(Cost) AS SumOfCost 

    FROM Testdf 

    GROUP BY [Group], 

       Product 

) T1 

INNER JOIN ( 

    SELECT [Group], 

       Product, 

       SUM(Cost) AS SumOfCost 

    FROM Testdf 

    GROUP BY [Group], 

       Product 

) T2 

ON T1.[Group] = T2.[Group] 

    AND (T1.SumOfCost < T2.SumOfCost

    OR (T1.SumOfCost = T2.SumOfCost 

    AND T1.Product <= T2.Product))

GROUP BY T1.[Group], T1.Product 

HAVING Count(*) = 1

Sep 27 '11 #14

NeoPa

32,556

Expert Mod 16PB

I think I see it now Rabbit. It makes sense to compare the [Product]s as well as the sums on the basis that, even though the OP didn't specify how to handle values which were equal, it illustrates how it could be done if required.

The other point to make of course, is that the HAVING Count(*) = 1 line is really HAVING Count(*) = n as it would work for any TOP n requirement. A very sweet solution I must say, even if I confess to being a little disappointed a more straightforward one wasn't available even for the simpler (less sophisticated) Max() requirement.

Sep 30 '11 #15

Rabbit

12,516

Expert Mod 8TB

Well, I think a simpler, or rather easier to understand, solution is to use a subquery in the where clause. But I've found that this solution, for very large numbers of records, runs faster as it doesn't have to run a subquery for every row.

Sep 30 '11 #16

NeoPa

32,556

Expert Mod 16PB

Indeed. As I tried in my post #11, but which I couldn't get to work.

Sep 30 '11 #17

Similar topics

query variable string text with counts

by: Randy K | last post by:

I have a table with some 35000 records and I need some help sorting it out. The goal is to get counts of failures modes oraganized by serial number. the table is set up roughly like this. s/n ...

Microsoft Access / VBA

How to declare a variable in the global scope?

by: Miguel Dias Moura | last post by:

Hello, I am working on an ASP.NET / VB page and I created a variable "query": Sub Page_Load(sender As Object, e As System.EventArgs) Dim query as String = String.Empty ... query =...

ASP.NET

Use Select Query variable against a table and return value to same select query

by: jjturon | last post by:

Can anyone help me?? I am trying to pass a Select Query variable to a table using Dlookup and return the value to same select query but to another field. Ex. SalesManID ...

Microsoft Access / VBA

generating autoincrement value

by: reshmaah | last post by:

hi iam interested in generating a variable value by autoincrementing, in CGI using c as user enters some data, i want to generate an index for it, and store in a binary file with data, as...

C / C++

Passing a JavaScript variable as the src of an EMBED

by: polymorphic | last post by:

I am no longer good at Javascript and need help. I'm trying to embed pdf files in html then build some sort of navigation between the pdfs via the pdf numbered file name. I can generate the...

Javascript

Generating reports, variable is undefined.

by: rendanime | last post by:

I need your help, i have designed a system using PHP. Now i want to create a report for projects. Currently the system can genarate report for all available project ( i manage to do this because is...

PHP

Query Criteria to Identify Expired Credit Cards

by: sharsy | last post by:

Hello guys, I would like some help in generating query criteria that will identify credit cards that have expired on an access database. The specific Field is formatted with a Data Type of...

Microsoft Access / VBA

Pass Query String to Frame src using Javascript

by: I Hate My Computer | last post by:

I am using frames on a website. The title link on the title page adds a query string. The link goes to a page with two rows the second has two columns. I want the right column to be changed depending...

Javascript

Method param in string variable

by: Looch | last post by:

All, I'm trying to output but I can only get (brackets for clarity) when using the code below. How can I "break" into the query variable in the InsertName method to add the name parameter to...

C# / C Sharp

Perl Database SQL query variable passing

by: sonu2die4 | last post by:

Hi, I have the following perl script working for me.I am accesing the database from my perl script using use Net::Telnet(); package. I am not using DBI package.as I stated earlier the following...

Perl

Merging data from multiple Excel files

by: ryjfgjl | last post by:

In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...

Data Management

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server