Driving me nuts... need Avg(Sum(iif(condition,True,False))

5,501 Expert Mod 4TB

I have been doing this calculation in Excel for ages... very straight forward; however, the workbook has dozen of worksheets and it is a multi-step process... you get the picture :)

What I am doing is a method of standardization commonly
referred to as "area normalization"
Eventually I need to get to
RFC(i)= Known(i)/avg_area(i)
Cnst(i) = RFC(i)/RFC(0)
Result = (avg_area(i)*Cnst(i)) / sum(area(i)*Cnst(i))

where (i) is the constituent and (0) is the reference point in the analysis. avg_area is the average of the raw data point over several analysis.

So in the workbook I calculate the Cnst(i) that is used in the instrument, or workbook, to calculate results.

So I have the database normalized
PK = Primary Key; FK = Foreign Key to indicated table;
IK = Internal Key to the table (self join)

[T_Constituent]
[PK][Constituent_Name]
[1][A]
[2][b]
[3][C]
etc...

[T_System]
[PK][System_Name]
[1][X]
[2][Y]
[3][Z]
etc...

[T_Stanards]
[PK][Stanards_name][IK_Sample]
[1][ReferenceStd][1]
[2][SecondaryStd][1]
etc...
(the [IK] is used to relate the raw data back to the reference standard ([PK]=[IK])...)

[T_KnownValue]
[PK][FK_Sample][FK_Constituent][KnownValue_Value]
[1][1][1][1.05]
[2][1][2][2.50]
[3][1][3][5.00]
etc...
(Not sure if I should put this in a different table; however, these are not directly related to the results)

[T_Injections]
[PK][Inj_DateTime][FK_Sample][FK_System][Injection_use]
[1][2017-05-03 10:18:27][1][1][1]
[2][2017-05-03 11:18:27][1][1][1]
[3][2017-05-03 12:18:27][1][1][1]
[4][2017-05-03 10:18:27][1][2][1]
[5][2017-05-03 11:18:27][1][2][1]
[6][2017-05-03 12:18:27][1][2][1]
[7][2017-05-04 10:18:27][1][1][1]
[8][2017-05-04 11:18:27][1][1][1]
[9][2017-05-04 12:18:27][1][1][1]
[10][2017-05-04 10:18:27][1][2][1]
[11][2017-05-04 11:18:27][1][2][1]
[12][2017-05-04 12:18:27][1][2][1]

[T_InjectionResult] (these are raw results %area)
[PK][FK_Injection][FK_Constituent][Injection_Value]
[1] [1] [1] [0.98] 1
[2] [1] [2] [0.17] 1
[3] [1] [3] [1.05] 1
[4] [1] [4] [3.20] 1
[5] [1] [6] [40.38] 1
[6] [1] [8] [54.22] 1
[7] [2] [1] [1.09] 1
[8] [2] [2] [0.05] 1
[9] [2] [3] [1.04] 1
[10] [2] [4] [3.16] 1
[11] [2] [6] [40.41] 1
[12] [2] [8] [54.25] 1
[13] [3] [1] [1.02] 1
[14] [3] [3] [1.04] 1
[15] [3] [4] [3.13] 1
[16] [3] [6] [40.5] 1
[17] [3] [8] [54.31] 1
[18] [4] [1] [1.02] 1
[19] [4] [3] [1.04] 1
[20] [4] [4] [3.13] 1
[21] [4] [6] [40.54] 1
[22] [4] [8] [54.28] 1

I've crosstab-queried this and we have a nice table with the injection results for each system as rows and the constituents as columns.

The first step is the get the average of the areas for the day (each day is a batch run against the primary reference)
That would be easy enough; however, I need to sum a few of the constituents first such as "A" and "B":

Expand|Select|Wrap|Line Numbers

sum(iif([FK_Constituent]=1 or [FK_Constituent]=2,[Injection_Value],0))

[FK_Constituent]=2 doesn't appear for all samples
so for
[T_InjectionResult].[pk]=1 and 2
0.98+0.17 = 1.15
[T_InjectionResult].[pk]=7 and 8
1.09+0.05 = 1.14
In the cross tab, so long as I have the injections as individual rows (records) this works very nicely. Even in an aggregate I can get this to work so long as these are individual injections

Then average the two AVG(1.15+1.14)= 1.145
This is within the day and per system (the calibration is different for each system) and this is where things break.

I can get the correct sum in the Cross tab or in an aggregate if I do the individual injections, as soon as I group on date/system... the sum adds all of the values togeither (0.98+0.17+1.09+0.05) if by hand I divide by the injection count (in this case 2) this works

Of course I get an error

Expand|Select|Wrap|Line Numbers

avg(sum(iif([FK_Constituent]=1 or [FK_Constituent]=2,[Injection_Value],0)))

gave up on the crosstabs and I tried two queries
Count

Expand|Select|Wrap|Line Numbers

 SELECT Format([t_injection].[Injection_Date],"yyyy-mm-dd") AS InjDay, t_Injection.FK_Standard

, t_Injection.FK_System

, Count(t_Injection.PK_Injections) AS CountOfPK_Injections

FROM t_Injection

WHERE (((t_Injection.Injection_use)=1))

GROUP BY Format([t_injection].[Injection_Date],"yyyy-mm-dd")

, t_Injection.FK_Standard, t_Injection.FK_System;

Sum

Expand|Select|Wrap|Line Numbers

 SELECT Format([t_injection].[Injection_Date],"yyyy-mm-dd") AS InjDay, t_Injection.FK_Standard

, t_Injection.FK_System, Sum(IIf([t_Injection_result].[FK_Constituent]=1 Or [t_Injection_result].[FK_Constituent]=2,[t_injection_result].[Injection_Value],0)) AS SumAB

FROM t_Injection LEFT JOIN t_Injection_result ON t_Injection.PK_Injections = t_Injection_result.FK_Injection

WHERE (((t_Injection.Injection_use)=1))

GROUP BY Format([t_injection].[Injection_Date],"yyyy-mm-dd"), t_Injection.FK_Standard, t_Injection.FK_System;

If I try to innerjoin Q_sum with Q_Count then the counts are off (instead of 2 we have 17)

If I create a third query to bring both Q_sum and Q_count in, join on the fk_standard use the equation

Expand|Select|Wrap|Line Numbers

AvgAB: [zq_sum].[SumAB]/[zq_countinjperstandardbyday].[CountOfPK_Injections]

Then attempt to aggregate with the [AvgAB] then all of the records have the same value if I use as an expression then error:
"Your query does not include the specified 'AvgAB' as part of an aggregate function"

Ok, running on fumes and out of time for the day...
-Z

May 10 '17 #1

Subscribe Post Reply

1042

NeoPa

32,556

Expert Mod 16PB

Hi Z.

I don't come close to understanding your overall problem but I do have experience dealing with GROUP BYs/aggregate queries. Let me see if my first comments are helpful. If not so much then post the SQL of the query that gives the error message and we'll see if we can identify any problems.

The first thing to understand about aggregate queries (And please excuse me if I'm preaching to the choir.) is that every resulting field must be one of :

The input fields included in the GROUP BY clause.
Any of the other fields passed through an aggregation function such as Sum(), Avg(), First(), Max(), etc.
A expression which is a combination of types A and B above.

Everything must be aggregated in one, but only one, of the ways above.

Your error message ("Your query does not include the specified 'AvgAB' as part of an aggregate function") indicates that [AvgAB] has been referenced at a point where it hasn't been aggregated.

NB. the following clauses are either effective pre- or post- aggregation :

SELECT -> Post.
FROM -> Pre.
WHERE -> Pre.
GROUP BY -> Pre.
HAVING -> Post.
ORDER BY -> Post.

I hope this is some help :-)

May 11 '17 #2

zmbd

5,501

Expert Mod 4TB

Unfortunately that's where I'm at...

Look at the last two SQL in the OP
(sample === standard - I changed my mind on the names)

For the counting query I get:

Expand|Select|Wrap|Line Numbers

 
InjDay      FK_Standard FK_System CountOfPK_Injections

2017-05-03    1            1            3

2017-05-03    1            2            3

2017-05-03    1            3            3

2017-05-03    1            4            3

2017-05-03    2            3            4

2017-05-03    2            4            5

These are the correct counts per day, per system, per sample,

The second query for the sum

Expand|Select|Wrap|Line Numbers

 
InjDay     FK_Standard    FK_System   SumAnB

2017-05-03     1          1          3.44

2017-05-03     1          2          3.44

2017-05-03     1          3          3.31

2017-05-03     1          4          3.33

2017-05-03     2          3          4.02

2017-05-03     2          4          5.1

If you take

sigh... I think I just found it

I was just joining on FK_Standard between the two queries
Adding the join between both FK_Standard and FK_System.

I need to run the math by hand to verify the results.

If this works then I have A+B; D+E; F+G; C; H to pull in and then I have the Average of the raw results which I then need to potentially divide the known values by for the reference sample (RF(i)) to develop the final factors...
Simple equations and concept
RFC(i)= Known(i)/avg_area(i)
Cnst(i) = RFC(i)/RFC(0)
Result = (avg_area(i)*Cnst(i)) / sum(area(i)*Cnst(i))

IDK how many times just posting the question has helped me solve an issue!

As Always, thank you for being the sounding board... if this works I'll update this... I'm sure some other chemist is contemplating beating the head into the wall

May 11 '17 #3

zmbd

5,501

Expert Mod 4TB

The join between the two queries worked.
Bit of a kludge though to do it this way in that every time I add a constituent then I have to add this to the sum query and then to the join query

In the cross tab query I came up with
Field:

Expand|Select|Wrap|Line Numbers

 SumAB: 

   Sum(

      IIf(([t_Injection_result].[FK_Constituent]=1 

         Or [t_Injection_result].[FK_Constituent]=2)

         ,([t_Injection_result].[Injection_Value])

         ,(0))

      )

      /

     Sum(

         IIf(([t_Injection_result].[FK_Constituent]=0)

         ,(1)

         ,(0))

        )

Table: (null)
Total: Expression
Crosstab: Row Heading
In the reference sample/standard, there will always be a [PK_Constituent]=0 for all injections as this is a characteristic of the sample and is the reference point, if the raw value was (null) or zero then the entire normalization would fail on division by zero which is OK as the reference sample would be invalid/mis-made.

This will give me the average for the summed constituents
so I now have area(i) for RF(i)=knownvalue(i)/area(i)

The next steps will be to get the Cnst(i) based on the reference sample and then apply this against unknowns...

the saga continues; however, I think that's a new thread.

May 11 '17 #4

NeoPa

32,556

Expert Mod 16PB

As I say Z, most of that goes over my head. Understanding all that would take a level of context and detail understanding I just don't have the time and energy to invest. Much more than for you it would seem, as it's an area you're familiar with.

That said, I can understand snippets from time-to-time and if commenting on those can help - even just as a sounding-board - then I'm very happy to have helped and to observe the level of complexity you're dealing with.

Always a pleasure :-)

May 11 '17 #5

Seth Schrock

2,965

Expert 2GB

I have recently been working on a project that involved aggregate functions on a condition and was having similar problems. My solution was to one query that did the condition (in your case the iif(condition,True,False) and then base a second query on the first one that does the aggregation. Not sure if this would solve your problem or not, but thought I would suggest it.

May 12 '17 #6

zmbd

5,501

Expert Mod 4TB

Thank you NeoPa
I really wasn't expecting this to be so difficult.

For those that REALLY want to know more about what this thread is a step to, what I am doing is an analytical method call normalization of area response against known value,
(Not to be confused with database normalization :) ), often used in chromatography to convert raw percent peak areas into something more useful such as the percent mass of a compound found in the sample as it relates to a reference compound within the sample.

In Excel I start out with the known sample/reference
+ In first row the known values in the cells
A1, C1, D1, F1, H1
(in the raw responses we have A:H so the component names will be [A], [b], [C] ... [H])
So in the following anytime you see [A] that's the component
If you see just (letter) or (letter)(Number) that's a cell reference. [x]=one of the [A]-[H]

The reason I don't have a B1, E1, G1 known value is that in the end we use Const([A]) for both [A] and [b] and the same for [E] and [G]... I'll show that at the end :)

+ In the second through 4th rows the raw instrument values from the triplicate analysis through the instrument for the known. This tells us how the instrument is responding to the standard. (an aside, I've gone as high as 30 analysis of the known reference standard to establish response... see below)

+ Fifth row calculate the averages for each... however we group the value for A+B, D+E, and F+G:
A5=Avg((A2+B2),(A3+B3),(A4+B4)), C5=Avg(C1:C3), D5=Avg((D2+E2)...),...

+ Sixth row calculate the RF([x])
A6 = A1/A5, C6 = C1/C5, D6 = D1/D5 ....
Notice there's no RF([b]) because it's included in RF([A]) same for [E], and [G] (I'm getting to that :) )

+ Seventh row calculate the Const([x]) (no RF([b]) no Const([b]) same for [E] and [F]...
A7= A6/$F$6, C7= C6/$F$6, D7 = D6/$F$6 ....
obviously, because I'm using $F$6 when we come to
Const([F]) = F7 = F6/$F$6 = 1
Because [F] is the reference point that all of the other responses are being normalized (scaled to) against.

The seventh row are the Const([x]) that is then used against the unknown sample raw results to convert the raw values to the final result by standardizing (normalizing) the raw response. This is the number, Const([x]), I enter into the instrument software and that software does the following final calculations on the unknown for us... yea!

+ Sometimes we like to verify that the software is doing the calculations correctly for unknown samples.
So A10:H12 are used for the raw instrument response...
So very much like the reference sample the Column A+B etc... are added, however, this time instead of the average we multiply the corresponding value from row 7

+Normalized result:
Is the raw response multiplied by the corresponding Const([x])
So for just one result say row 10
A20 = A10 * A7
B20 = B10 * A7 (here's that combined Const([A+B])
C20 = C10 * C7
D20 = D10 * D7
E20 = E10 * D7
F20 = F10 * F7
G20 = G10 * F7
H20 = H10 * H7
I20 = sum(A20:H20) (used to rescale)

Do this for each analytical result (if three runs then we have rows 20 through 22 like this)

To find that final answer

Say drop to row 30 then for the first result we would have
[A] A30 = A20/$I$20
[b] B30 = B20/$I$20
[C] C30 = C20/$I$20
[D] D30 = D20/$I$20
[E] E30 = E20/$I$20
[F] F30 = F20/$I$20
[G] G30 = G20/$I$20
[H] H30 = H20/$I$20
and so on in the following rows for each result

And for the final report we have - if we ran in triplicate:
[A+B] = A35 = Avg(sum(A30,B30),sum(A31,B31),sum(A32,B32)...)
[C] = C35 = Avg(C30:C31...)
[D+E] = D35 = Avg(sum(D30,E30),sum(D31,E31),sum(D32,E32)...)
[F+G] = F35 = Avg(sum(F30,G30),sum(F31,G31),sum(F32,G32)...)
[H] = H35 = Avg(H30:H31...)
(the "..." is if I have more than three analytical runs, I'll do as many as 30 or 40 analysis against an unknown (or the known for that fact) when doing validations, co-labs, secondary standards, etc... in statistics that 30 is an arbitrary magic number and I am assuming a normal distribution Sampling Distribution of the sample (KhanAcademy))

Of course this become much more interesting when running both the reference and then unknown on multiple instruments and for different reference standards....
(so, on a given day, run of 30 injections on each instrument (4 in my case) gives me 120 results. I can then look at the individual groups of 30 from each instrument and with ANOVA or T-Test each instrument should be reporting the "same" value (yes simplified explanation) which if they are all in agreement then I can then take the 120 results and use those numbers to give me the population mean, and the mean and the median should be the same and if not then I have other issues, calculate the intra-lab error and other interesting statistical information. If I do this over several days, then I have a larger set of data which allows me to see how the lab accuracy and precession is holding from day to day - I don't normally run a sample 30 times on a daily basis - I'd never get any other work done!)

Nice thing here is that for the software that runs the instruments we "calibrate" by calculating the Const([x]) and the underlying Oracle database does the calculation against the unknown, that is to say
We calculate A7:F7 and enter these into the software
The Software then does the row 10 through 30 for us... I could have it do the row 35; however, we need to if [b], [E], and [G] are also being detected so we do that final calculation by hand.

ok, time for an ice-bath for the brain.

May 12 '17 #7

zmbd

5,501

Expert Mod 4TB

Seth Said:My solution was to one query that did the condition (in your case the iif(condition,True,False) and then base a second query on the first one that does the aggregation.

Not sure how that would work with my dataset, taking two injection result from op we have:
[T_InjectionResult]
[PK][FK_Injection][FK_Constituent][Injection_Value]
[1] [1] [1] [0.98]
[2] [1] [2] [0.17]
[3] [1] [3] [1.05]
[4] [1] [4] [3.20]
[5] [1] [6] [40.38]
[6] [1] [8] [54.22]
-
[7] [2] [1] [1.09]
[8] [2] [2] [0.05]
[9] [2] [3] [1.04]
[10] [2] [4] [3.16]
[11] [2] [6] [40.41]
[12] [2] [8] [54.25]
-
We're after avg(Sum(([FK_Constituent]=1,[FK_Constituent]=2)Injection_Value)
0.98+0.17 = 1.15
1.09+0.05 = 1.14
average(1.15,1.14)=1.145

The only way I see this working is in a crosstab query
CTQ with the conditional:

Expand|Select|Wrap|Line Numbers

 Sum_AB: 

    Sum(

       IIf(([t_Injection_result].[FK_Constituent]=1 

          Or [t_Injection_result].[FK_Constituent]=2)

          ,([t_Injection_result].[Injection_Value])

          ,(0)))

as a row heading
Using the join against [T_Injection]. This works (added the column headings 1 - 8 in the properties )

Expand|Select|Wrap|Line Numbers

 [FK_S][PK_I][FK_Sys][InjDate   ][Sum12 ][Sum45][Sum67][1     ][2    ][3   ][4   ][5   ][6    ][7   ][8    ]

[1   ][17  ][2     ][2017-05-03][1.14 ][3.16 ][40.41][1.09][0.05][1.04][3.16][null][40.41][null][54.25]

[1   ][18]  [2     ][2017-05-03][1.15 ][3.20 ][40.38][0.98][0.17][1.05][3.20][null][40.38][null][54.22]

FK_S refers to the sample table PK_I refers to the injection table

Then a second query based on the CTQ... this is a bit kludgy (not really too much more than my current solution :-) )... because not every injection has Constituent 1 - 8 I had to add those as headings to the CTQ; thus, if I were to add a 9th (1 - 9) this means that I have to add this to the heading property and add this to the second query (of course there's VBA)...

here I have to drop the fk_injection to get the averages however that's ok in this instance as I'm working with the standad... the unknown may have other issues (that's another thread I think)

Expand|Select|Wrap|Line Numbers

 [FK_S][FK_Sys][InjDate   ][AvgOfSum12][AvgOfSum45][AvgOfSum67][AvgOf1][AvgOf2][AvgOf3][AvgOf4][AvgOf5][AvgOf6][AvgOf7][AvgOf8]

[1   ][2     ][2017-05-03][1.1467    ][3.1733    ][40.3967   ][1.0567][0.0900][1.0433][3.1733][Null  ][40.397][Null  ][54.240]

><Update on the Math ><
I have a third injection in the data base

[PK][FK_Injection][FK_Constituent][Injection_Value]
[23] [39] [1] [1.10]
[24] [40] [2] [0.05]

So now we have
0.98+0.17 = 1.15
1.09+0.05 = 1.14
1.10+0.05 = 1.15
average(1.15,1.14,1.15)=1.14(66...)
Which does appear to be correct, which is why I didn't see any error between the current workbook and the database when I cross checked, because they are both using the same dataset.
Too many trees, just too many trees... (@~@)

May 12 '17 #8

NeoPa

32,556

Expert Mod 16PB

@Z & Seth.
Related to your post Seth.

While that can and does work, it shouldn't ever be necessary. That is an approach generally used by those that struggle with the complexities and want to break it down into more manageable chunks.

You'll probably realise, as I have from reading some of the complex sh*t Z gets up to in this thread, that he has no such difficulty dealing with such complexities.

When used properly (in line with my explanation in post #2) mixing input- and output-level items, and aggregation functions of course, in a GROUP BY query is very flexible.

May 13 '17 #9

zmbd

5,501

Expert Mod 4TB

Two things, I got bored with the IIF()
So just for fun switched to the Switch()
and it just bugged me using two queries

We have the aggregate query.
Field to do the count of aggregated records [Zcnt] (as expression)
The trick here is that constituent {F}, aka 6, must always be present in the reference standard; thus, allowing for a conditional count/sum

Expand|Select|Wrap|Line Numbers

Count((Switch(([t_Injection_result].[FK_Constituent]=6),(1))))

Field to sum on the conditional [Sum1_2] (as expression)

Expand|Select|Wrap|Line Numbers

 Sum(Switch(([t_Injection_result].[FK_Constituent]=1 

      Or [t_Injection_result].[FK_Constituent]=2)

      ,([t_Injection_result].[Injection_Value])))

Finally the field to do the average [Avg1_2] (as expression)

Expand|Select|Wrap|Line Numbers

[Sum1_2]/[zcnt]

So the full query with joins is (ok, I've only shown for Constituents {A} and {B}, aka 1 and 2, but one should see that we calc the sum and then the avg for each constituent or Constituent-group) :

Expand|Select|Wrap|Line Numbers

 SELECT Format([t_injection].[Injection_Date],"yyyy-mm-dd") AS InjDay

   , t_Injection.FK_Standard, t_Injection.FK_System

   , Count((Switch(([t_Injection_result].[FK_Constituent]=6),(1)))) AS zcnt

   , Sum(Switch(([t_Injection_result].[FK_Constituent]=1 

      Or [t_Injection_result].[FK_Constituent]=2)

      ,([t_Injection_result].[Injection_Value]))) AS Sum1_2

   , [Sum1_2]/[zcnt] AS Avg1_2

FROM t_Injection LEFT JOIN t_Injection_result 

   ON t_Injection.PK_Injections = t_Injection_result.FK_Injection

WHERE (((t_Injection.Injection_use)=1))

GROUP BY Format([t_injection].[Injection_Date],"yyyy-mm-dd")

   , t_Injection.FK_Standard, t_Injection.FK_System;

Giving me the final result of:

Expand|Select|Wrap|Line Numbers

 [InjDay    ][FK_Standard][FK_System][zcnt][Sum1_2][Sum4_5][Sum6_7 ][Avg1_2][Avg4_5][Avg6_7 ]

[2017-05-03][1          ][2        ][3.00][3.440 ][9.520 ][121.190][1.1467][3.1733][40.3967]

which is the same result as in my last post... and checks out against the excel-workbook/worksheet.
I tried this same approach using IIF() with the same results, just an extra few commas and zeros.

Still a kludge in that I have to do this same thing for each constituent (so there will be [Sum3],[Sum8], [Avg3],[Avg8]) and if I add constituents to [t_constituent] I'll have to hand enter it into the query.

At this point I should be able to Add the RF([x]) into this query (we'll see how that join works) and then then the final Const([x])

Would be so much simpler if Access would do the average on the sums, Avg(Sum(IIF())), as a straight forward aggregate calculation.

May 14 '17 #10

Driving me nuts... need Avg(Sum(iif(condition,True,False))

Similar topics