I've got data in the following format: - Key Group Product Cost
-
1 1 itema 97.62
-
2 1 itema 139.12
-
3 1 itema 105.65
-
4 1 itema 130.09
-
5 1 itemb 88.95
-
6 1 itemb 124.14
-
7 1 itemc 90.78
-
8 1 itemc 111.21
-
9 1 itemc 29.70
-
10 1 itemc 86.56
-
11 2 itema 134.51
-
12 2 itema 132.92
-
13 2 itema 95.85
-
14 2 itema 110.70
-
15 2 itemb 70.24
-
16 2 itemb 104.59
-
17 2 itemc 83.69
-
18 2 itemc 120.05
-
19 2 itemc 104.20
-
20 2 itemc 101.09
-
21 3 itema 90.74
-
22 3 itema 127.02
-
23 3 itema 96.08
-
24 3 itema 123.52
-
25 3 itemb 129.32
-
26 3 itemb 85.67
-
27 3 itemc 114.09
-
28 3 itemc 72.28
-
29 3 itemc 60.02
-
30 3 itemc 91.68
...and I need to build a query that returns the top 1 (or n) products by sum of costs by group. I have tried a simple select top 1 (or n) query with grouping and sorting: - SELECT TOP 1 Testdf.group, Testdf.product, Sum(Testdf.cost) AS SumOfcost
-
FROM Testdf
-
GROUP BY Testdf.group, Testdf.product
-
ORDER BY Sum(Testdf.cost) DESC;
...but it only returns one value (the single highest sum of cost). What I need is a query that returns 1 (or n) rows per group that contains the prouct with the highest sum of cost, grouped by group (e.g., 1, 2...). While it may seem trivial with the dataset I provided, however, the dataset I'm using has millions of records... HELP! :)
16 3706 NeoPa 32,556
Expert Mod 16PB
The TOP predicate of the SELECT clause pertains to the query as a whole and is nowhere defined as working within each group. IE. What you are returning with that SQL is exactly what would be expected - One record (or more if there were records with equal values).
You could try out the following SQL and see if it works for you for TOP n, otherwise - SELECT sub1.Group
-
, sub2.Product
-
, sub2.SumOfCost
-
FROM (SELECT DISTINCT [Group]
-
FROM [TestDF]
-
) AS sub1
-
INNER JOIN
-
(SELECT TOP 2
-
[Group]
-
, [Product]
-
, Sum([Cost]) AS [SumOfCost]
-
FROM [TestDF] AS TDF
-
WHERE TDF.Group=sub1.Group
-
GROUP BY TDF.Group
-
, TDF.Product
-
ORDER BY [SumOfCost] DESC
-
) AS sub2
-
ON sub1.Group=sub2.Group
I've used TOP 2 simply as an illustration so, if this works, you can replace it with the value of your choice. I'm not in a good position to test it out though. Let me know how it goes when you do and we can take it from there.
I don't think you can refer to an outer query from a subquery in the FROM clause. However, you should be able to move that logic within the WHERE clause and it should work just as well.
Having said that, another method is to join the table to itself and use aggregates to achieve a similar result. - SELECT T1.Category, Max(T1.Price)
-
FROM tableName AS T1
-
INNER JOIN tableName AS T2
-
ON T1.Category = T2.Category
-
AND T1.Price <= T2.Price
-
AND T1.PK <= T2.PK
-
GROUP BY T1.Category
-
HAVING Count(*) = 1
NeoPa 32,556
Expert Mod 16PB
I suspect you may be right Rabbit, but you use of different field names makes it very hard for me to follow what you're trying to illustrate. Even if I change the names I can't see where it matches, but how much is wrong with the SQL and how much is my inability to follow it is not clear.
The SQL Guys are going to kill me, but if you wish to take the simpler approach, simply Pass the [Group] and [Product] Values in a GROUP BY Query to a Public Function that does all the work. The down side is that Processing Time will probably be much greater that its SQL counterpart. You decide. - SELECT TestDF.Group, TestDF.Product, fCalcMAXCostPerGroup([Group],[Product]) AS MAX_Cost_Per_Group
-
FROM TestDF
-
GROUP BY TestDF.Group, TestDF.Product, fCalcMAXCostPerGroup([Group],[Product])
-
ORDER BY TestDF.Group, TestDF.Product;
- Public Function fCalcMAXCostPerGroup(lngGroup As Long, strProduct As String) As Double
-
fCalcMAXCostPerGroup = DMax("[Cost]", "TestDF", "[Group] = " & lngGroup & _
-
" AND [Product] = '" & strProduct & "'")
-
End Function
Query OUTPUT based on Data supplied in Post #1: - Group Product MAX_Cost_Per_Group
-
1 itema 139.12
-
1 itemb 124.14
-
1 itemc 111.21
-
2 itema 134.51
-
2 itemb 104.59
-
2 itemc 120.05
-
3 itema 127.02
-
3 itemb 129.32
-
3 itemc 114.09
I like to change the names and use a more simplified example so that the OP has to put in some work.
But if I've confused you, then the OP will have a heck of a time.
Given the following data -
PK Category Price
-
1 A 5
-
2 A 7
-
3 A 2
-
4 B 10
-
5 B 10
-
The SQL join will result in -
T1.PK T1.Category T1.Price T2.PK
-
1 A 5 1
-
1 A 5 2
-
2 A 7 2
-
3 A 2 1
-
3 A 2 2
-
3 A 2 3
-
4 B 10 4
-
4 B 10 5
-
5 B 10 5
-
At this point, having written out this much out, I realize now that the SQL group by I posted is wrong. It should be grouped by the PK.
Anywho, those PK that have a count of 1 is as follows -
PK Category Price
-
2 A 7
-
5 B 10
-
NeoPa 32,556
Expert Mod 16PB
My main confusion was not so much that I was unable to cross-reference what you posted with the question at hand, as much as that I suspected there was no appropriate match. It didn't seem to me that your suggestion could be used in such a scenario. Unfortunately, to be sure I would have to reverse engineer your thinking, which I wasn't too keen to do in the circumstances. I appreciate your ongoing attempts to avoid simply providing solutions for the OP, but a side-effect is that it's extra difficult to review what you post for appropriateness.
As I say, I'm not convinced you have a workable solution, although I am convinced you're right when you say mine wouldn't work, but I don't want to try to check it out in encrypted form. As they say - Life's complicated enough.
To put it into the terms of the OP's problem, it would be this. -
SELECT T1.Group, T1.Product, MAX(T1.SumOfCost) AS MaxSumOfCost
-
FROM (
-
SELECT Group,
-
Product,
-
SUM(Cost) AS SumOfCost
-
FROM Testdf
-
GROUP BY Group,
-
Product
-
) AS T1
-
INNER JOIN (
-
SELECT Group,
-
Product,
-
SUM(Cost) AS SumOfCost
-
FROM Testdf
-
GROUP BY Group,
-
Product
-
) AS T2
-
ON T1.Group = T2.Group
-
AND T1.SumOfCost <= T2.SumOfCost
-
AND T1.Product <= T2.Product
-
GROUP BY Group, Product
-
HAVING Count(*) = 1
Thanks for the consideration of my plight. I follow the creation of T1 and T2 queries that aggregate the data, but then my ability to follow gets hazy. I will try to work through your suggestions. I am curious, however, as to how the inner join works here. Rabbit, please explain the use of the <= operators and the (*) in the HAVING clause(?). And the initial SELECT statement, I assume I would need to be more specific in calling out the origial Testdf (Testdf.group, Testdf.product...)?
I made an adjustment to the original query to qualify the columns in the select clause. There's no need to create multiple queries. That's just one query. The less than or equal to operator is to find the maximum cost per group. The count(*) is to get just the first such match.
NeoPa 32,556
Expert Mod 16PB
I tried this out as I still wasn't sure exactly what was going on. I had some understanding of why certain things were included but didn't get the feeling they were used quite as intended (I see no logic in comparing the [Product] field on a <= basis for instance). I had to qualify the Group & Product fields in the last GROUP BY clause (very minor mod) to get it to work, but these were the results I got (I don't think I materially changed your SQL but let me know if I have done so inadvertently anywhere) : - SELECT T1.Group
-
, T1.Product
-
, MAX(T1.SumOfCost) AS MaxSumOfCost
-
FROM (SELECT Group
-
, Product
-
, SUM(Cost) AS SumOfCost
-
FROM Testdf
-
GROUP BY Group
-
, Product
-
) AS T1
-
INNER JOIN
-
(SELECT Group
-
, Product
-
, SUM(Cost) AS SumOfCost
-
FROM Testdf
-
GROUP BY Group
-
, Product
-
) AS T2
-
ON (T1.Group = T2.Group)
-
AND (T1.Product <= T2.Product)
-
AND (T1.SumOfCost <= T2.SumOfCost)
-
GROUP BY T1.Group
-
, T1.Product
-
HAVING Count(*) = 1
- Group Product MaxSumOfCost
-
1 itema £472.48
-
1 itemc £318.25
-
2 itema £473.98
-
2 itemc £409.03
-
3 itema £437.36
-
3 itemc £338.07
I'm assuming that, for the want of any idea to actually handle a TOP n, we're going straight to a TOP 1 (IE. Max).
I tried the following SQL, which I think should work logically, but it keeps prompting me for T1.Group : - SELECT T1.Group
-
, T1.Product
-
, T1.SumOfCost
-
FROM (SELECT [Group]
-
, [Product]
-
, Sum([Cost]) AS [SumOfCost]
-
FROM [TestDF]
-
GROUP BY [Group]
-
, [Product]
-
) AS [T1]
-
WHERE T1.SumOfCost = (SELECT Max([SumOfCost]) AS [MaxSumOfCost]
-
FROM (SELECT [Product]
-
, Sum([Cost]) AS [SumOfCost]
-
FROM [TestDF] AS [T2]
-
WHERE (T2.Group = T1.Group)
-
GROUP BY [Product]
-
) AS [T3]
-
)
Frankly, most of the techniques I've used here I've picked up from Rabbit anyway, but I'm disappointed I was uanable to get this to work as expected in the time available (I have to go now so I'm calling a halt on it for now).
I'll have to create a test version to see what's going on. I'll post back once I do so. The reason for the product comparison is for the off chance that the sum of two different products within the same group ends up being the same.
NeoPa 32,556
Expert Mod 16PB Rabbit:
The reason for the product comparison is for the off chance that the sum of two different products within the same group ends up being the same.
Would that not require a simple equals comparison instead then?
Here's the final SQL. I just had to tweak the join condition a little bit. - SELECT T1.[Group], T1.Product, MAX(T1.SumOfCost) AS MaxSumOfCost
-
FROM (
-
SELECT [Group],
-
Product,
-
SUM(Cost) AS SumOfCost
-
FROM Testdf
-
GROUP BY [Group],
-
Product
-
) T1
-
INNER JOIN (
-
SELECT [Group],
-
Product,
-
SUM(Cost) AS SumOfCost
-
FROM Testdf
-
GROUP BY [Group],
-
Product
-
) T2
-
ON T1.[Group] = T2.[Group]
-
AND (T1.SumOfCost < T2.SumOfCost
-
OR (T1.SumOfCost = T2.SumOfCost
-
AND T1.Product <= T2.Product))
-
GROUP BY T1.[Group], T1.Product
-
HAVING Count(*) = 1
NeoPa 32,556
Expert Mod 16PB
I think I see it now Rabbit. It makes sense to compare the [Product]s as well as the sums on the basis that, even though the OP didn't specify how to handle values which were equal, it illustrates how it could be done if required.
The other point to make of course, is that the HAVING Count(*) = 1 line is really HAVING Count(*) = n as it would work for any TOP n requirement. A very sweet solution I must say, even if I confess to being a little disappointed a more straightforward one wasn't available even for the simpler (less sophisticated) Max() requirement.
Well, I think a simpler, or rather easier to understand, solution is to use a subquery in the where clause. But I've found that this solution, for very large numbers of records, runs faster as it doesn't have to run a subquery for every row.
NeoPa 32,556
Expert Mod 16PB
Indeed. As I tried in my post #11, but which I couldn't get to work.
Sign in to post your reply or Sign up for a free account.
Similar topics
by: Randy K |
last post by:
I have a table with some 35000 records and I need some help sorting it
out.
The goal is to get counts of failures modes oraganized by serial
number. the table is set up roughly like this.
s/n ...
|
by: Miguel Dias Moura |
last post by:
Hello,
I am working on an ASP.NET / VB page and I created a variable "query":
Sub Page_Load(sender As Object, e As System.EventArgs)
Dim query as String = String.Empty
...
query =...
|
by: jjturon |
last post by:
Can anyone help me??
I am trying to pass a Select Query variable to a table using Dlookup
and return the value to same select query but to another field.
Ex.
SalesManID ...
|
by: reshmaah |
last post by:
hi
iam interested in generating a variable value by autoincrementing, in CGI using c
as user enters some data, i want to generate an index for it,
and store in a binary file with data,
as...
|
by: polymorphic |
last post by:
I am no longer good at Javascript and need help. I'm trying to embed pdf files in html then build some sort of navigation between the pdfs via the pdf numbered file name.
I can generate the...
|
by: rendanime |
last post by:
I need your help, i have designed a system using PHP. Now i want to create a report for projects. Currently the system can genarate report for all available project ( i manage to do this because is...
|
by: sharsy |
last post by:
Hello guys,
I would like some help in generating query criteria that will identify
credit cards that have expired on an access database.
The specific Field is formatted with a Data Type of...
|
by: I Hate My Computer |
last post by:
I am using frames on a website. The title link on the title page adds a query string. The link goes to a page with two rows the second has two columns. I want the right column to be changed depending...
|
by: Looch |
last post by:
All,
I'm trying to output but I can only get (brackets for clarity) when using the code below.
How can I "break" into the query variable in the InsertName method to
add the name parameter to...
|
by: sonu2die4 |
last post by:
Hi,
I have the following perl script working for me.I am accesing the database from my perl script using use Net::Telnet(); package.
I am not using DBI package.as I stated earlier the following...
|
by: ryjfgjl |
last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
|
by: emmanuelkatto |
last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud.
Please let me know.
Thanks!
Emmanuel
|
by: BarryA |
last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
|
by: nemocccc |
last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
|
by: Sonnysonu |
last post by:
This is the data of csv file
1 2 3
1 2 3
1 2 3
1 2 3
2 3
2 3
3
the lengths should be different i have to store the data by column-wise with in the specific length.
suppose the i have to...
|
by: Hystou |
last post by:
There are some requirements for setting up RAID:
1. The motherboard and BIOS support RAID configuration.
2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
|
by: marktang |
last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
|
by: Oralloy |
last post by:
Hello folks,
I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>".
The problem is that using the GNU compilers,...
|
by: Hystou |
last post by:
Overview:
Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
| |