Grouping Query Range

2,653

Expert 2GB

Hi, JC.

Partition function will return value w/o right bound whenever its Number argument higher then its stop argument

example
? Partition(1,0,5,2)
0: 1
? Partition(2,0,5,2)
2: 3
? Partition(5,0,5,2)
4: 5
? Partition(6,0,5,2)
6:
? Partition(16,0,5,2)
6:

You may reformat Partition() function output to what you like or write your own VBA function doing the same.
Additionally there is a pure SQL solution but it requires additional table.

Regards,
Fish

May 27 '08 #7

JC2710

Hi, JC.

Partition function will return value w/o right bound whenever its Number argument higher then its stop argument

example
? Partition(1,0,5,2)
0: 1
? Partition(2,0,5,2)
2: 3
? Partition(5,0,5,2)
4: 5
? Partition(6,0,5,2)
6:
? Partition(16,0,5,2)
6:

You may reformat Partition() function output to what you like or write your own VBA function doing the same.
Additionally there is a pure SQL solution but it requires additional table.

Regards,
Fish

OK! Thanks for all your help! Ive managed to solve the problem by having a query with 6 subqueries in it! Seems to work fine!

Thanks for all your help

May 27 '08 #8

2,545

Expert Mod 2GB

Hi. The most flexible way to do what you ask is as Fish indicated - use a separate band table. That way you can use bands with unequal boundaries (as illustrated in the attached zipped database).

The banding table is added to the base data without using INNER, LEFT or RIGHT JOINs; instead, the rows of the banding table are selected where the value being compared is between the band minimum and maximum. The query with the joined banding can then be used as the basis for crosstabs, simple counts, histograms and so on.

I attach in zip form a sample database (A2003 format) which provides a simple example of age banding data. The age band table is used in the queries to provide the base data for histograms.

The extract is age, category and gender of a set of students, and illustrates in cut-down form how such banding works in practice. I use this technique (on a larger dataset) to perform age-related analysis of student data by gender, course type, location, campus, course, qualification and so on - all based on a single overarching query providing a view of the student data banded by age band.

-Stewart

Attached Files

Histogram.zip (19.4 KB, 322 views)

May 27 '08 #9

ADezii

8,834

Expert 8TB

Hi. The most flexible way to do what you ask is as Fish indicated - use a separate band table. That way you can use bands with unequal boundaries (as illustrated in the attached zipped database).

The banding table is added to the base data without using INNER, LEFT or RIGHT JOINs; instead, the rows of the banding table are selected where the value being compared is between the band minimum and maximum. The query with the joined banding can then be used as the basis for crosstabs, simple counts, histograms and so on.

I attach in zip form a sample database (A2003 format) which provides a simple example of age banding data. The age band table is used in the queries to provide the base data for histograms.

The extract is age, category and gender of a set of students, and illustrates in cut-down form how such banding works in practice. I use this technique (on a larger dataset) to perform age-related analysis of student data by gender, course type, location, campus, course, qualification and so on - all based on a single overarching query providing a view of the student data banded by age band.

-Stewart

Hello and thanks for the reply, Stewart. I'd like to see that Attached DB whenever you get a chance.

May 27 '08 #10

2,545

Expert Mod 2GB

Should be visible now, ADezii - if not please let me know.

-Stewart

Hello and thanks for the reply, Stewart. I'd like to see that Attached DB whenever you get a chance.

May 27 '08 #11

ADezii

8,834

Expert 8TB

Should be visible now, ADezii - if not please let me know.

-Stewart

I know I'm getting older, Stewart, but I still can't see anything. No comment necessary, NeoPa! (LOL).

May 27 '08 #12

NeoPa

32,556

Expert Mod 16PB

Subscribing .

May 27 '08 #13

2,545

Expert Mod 2GB

Hi ADezii. It's attached on post #9 - at least I can see it and it is listed as one view, so someone else can too...

By the way, although the example uses a Count query any form of totalling can be done on the banded data - Sum (which the examples in post 1 would require), Average, Min, Max, whatever.

-Stewart

I know I'm getting older, Stewart, but I still can't see anything. No comment necessary, NeoPa! (LOL).

May 27 '08 #14

ADezii

8,834

Expert 8TB

Hi ADezii. It's attached on post #9 - at least I can see it and it is listed as one view, so someone else can too...

By the way, although the example uses a Count query any form of totalling can be done on the banded data - Sum (which the examples in post 1 would require), Average, Min, Max, whatever.

-Stewart

Thanks Stewart for the explanation and Attachment, interesting concept.

May 28 '08 #15

NeoPa

32,556

Expert Mod 16PB

Hi. The most flexible way to do what you ask is as Fish indicated - use a separate band table. That way you can use bands with unequal boundaries (as illustrated in the attached zipped database).

The banding table is added to the base data without using INNER, LEFT or RIGHT JOINs; instead, the rows of the banding table are selected where the value being compared is between the band minimum and maximum.
...

Nice idea Stewart.

I'm curious as to why you wouldn't use an INNER JOIN to link the tables though?

I would expect that to work better than a simple WHERE clause.

May 28 '08 #16

2,545

Expert Mod 2GB

Hi NeoPa. Reason for no joins is that there are no common fields involved. The comparison is of a single discrete value in one table against a minimum to maximum range of values in the banding table, not key against key. It is the principle of the histogram, where discrete values of a population are grouped over specified intervals into separate but contiguous bands, with counts for each band subsequently displayed in the typical column chart form.

This kind of banding application is one of those rare occasions when such a Cartesian join between tables is essential, with the inherent multiplication of rows stopped by the banding range always matching just one of the rows in the banding table.

The reason a band number is included is to be able to order the bands correctly if required - and to be able to left-join the banding table back to the results later if necessary (because, depending on the range of the data, the result bandings are likely to be non-contiguous, and left-joining the bandings back will allow inclusion of all bands, even if there are no results for particular bands).

Hope this helps to explain the approach.

-Stewart

May 28 '08 #17

2,653

Expert 2GB

I agree with NeoPa. I would prefer to JOIN tables though with only two tables involved it seems not to have a great advantage.

Regards,
Fish

May 28 '08 #18

2,545

Expert Mod 2GB

I agree entirely, Fish. When working with relational tables there are always joins involved, and in all of my applications I define the joins and set relational constraints as a matter of routine. However, the banding application is one of these rare occasions when this is simply not possible - in set algebra terms the sets involved are disjoint and cannot be equijoined or outer joined.

When I analyse several years worth of student data, involving around 30,000 rows, the time penalties and delays are in the subsequent multiple-field grouping and summations not in the formation of a banded base table involving joining of the 30,000 rows to a small table of around 8 rows using a where clause to restrict the rows joined.

-Stewart

May 28 '08 #19

2,653

Expert 2GB

Well. As I could recall, JOIN

Expand|Select|Wrap|Line Numbers

... FROM t1 ... JOIN t2 ON blnExpr

is a later SQL syntax improvement for cartesian product filtering

Expand|Select|Wrap|Line Numbers

... FROM t1, t2 WHERE blnExpr

The only difference known to me is that Expr from the first snippet must be evaluated on fields from both tables being joined.
All the rest seems to be allowed - any kind and combination of arithmetic and logic operations, VBA calls etc. So long it turns into boolean.

Advantage of JOIN syntax is that it allows outer joins and explicitely define the order of joining.

In context of OP's question both syntaxes performs more or less equally.

Regards,
Fish

P.S. Did you mean something else?

May 28 '08 #20

2,545

Expert Mod 2GB

Hi Fish, and thanks for a very thoughtful reply. What I was referring to is that for the most part Cartesian products are avoided, and seen as 'a bad thing', because of the row multiplication involved, but there are occasions when no other approach will do. The lack of an explicit join could be detrimental in some circumstances, but as your reply clarifies the JOIN and WHERE approaches have similar roots and will perform similarly given the small scale of the banding table involved.

Where I am coming from is that the use of a Cartesian product in this case is quite deliberate and intentional, as it is dictated by the disjoint nature of the datasets - a very different case to the accidental creation of a cross-multipled table for related data by neglecting to join two tables correctly.

One other difference in the JOIN and WHERE approaches is that with explicit JOINs the database engine query optimisers can avoid projecting the Cartesian product to begin with, thereby operating more efficiently than applying a WHERE clause after the event might do - depending on how efficient the underlying operations and the optimising algorithms really are (which we can only infer from overall performance, as the optimisations are not visible or alterable by us).

Thank you again for your thoughtful reply which provides further explanation of the JOIN vs WHERE approaches. Really interesting, even if we are drifting a bit from the original thread!

-Stewart

May 28 '08 #21

2,653

Expert 2GB

.....
Where I am coming from is that the use of a Cartesian product in this case is quite deliberate and intentional, as it is dictated by the disjoint nature of the datasets - a very different case to the accidental creation of a cross-multipled table for related data by neglecting to join two tables correctly.
.....

Sorry, Stewart.

What "disjoint nature" is all about?
Just couldn't understand why it does "dictate". ;)

Regards,
Fish

May 28 '08 #22

2,545

Expert Mod 2GB

Hi Fish. Disjoint is a term from set theory and just means that there are no elements in common. Set theory underlies the relational model developed by EF Codd in the 60s, updated by him since and (imperfectly) implemented in the various standards for SQL ever since.

When there is a relation between tables T1 and T2 the underlying sets have at least one element of overlap. There is no such overlap in the case of a range table (where each row represents a paired minimum to maximum range), and the data which contains discrete values (not min and max tuples). The sets are disjoint, therefore no common fields, so no way to join them on a relationship between fields.

To illustrate: age part of student data set:

{16, 19, 18, 18, 15, 64, 20,...,32}

Part of range set:

{{14, 15}, {16, 17}, {18, 21},...{61, 100}}

The sets are disjoint, so no value-based join is possible. If the product of the set of N values with a range set of M values was created there would be N x M rows in the result - the Cartesian product. But there is a functional relationship between ages and age ranges, which is that the discrete value will match one of the tuples where its value is between the values of the tuple elements - so although no 1-1 join is possible the two datasets of N x M rows can still be combined to give just N rows for the result by using the WHERE clause to find the correct range.

There is a more formal treatment of disjoint sets at Wikipedia - linked here.

I hope this helps.

-Stewart

Sorry, Stewart.

What "disjoint nature" is all about?
Just couldn't understand why it does "dictate". ;)

Regards,
Fish

May 29 '08 #23

2,653

Expert 2GB

Ok, Stewart.

You have not convinced me though. What you are talking about is table PK/FK relations based on key equalness. Table join, on the other hand, is a tool to combine records on some criteria regardless of PK/FK relationships. Sure it is commonly used to get denormalized view of normalized tables, but the area of table join application is definitely not limited by this.

Expressions like

Expand|Select|Wrap|Line Numbers

SELECT * FROM t1, t2 WHERE t1.f1>t2.UpperBound AND t1.f1<t2.LowerBound;

and

Expand|Select|Wrap|Line Numbers

 
SELECT * FROM t1 INNER JOIN t2 ON t1.f1>t2.UpperBound AND t1.f1<t2.LowerBound;

are equivalents and both working.
The advantage of JOIN syntax is, as I've stated already, explicit join order and possibility to perform outer (LEFT, RIGHT) joins.
BTW, earlier, outer joins had a special syntax:

Expand|Select|Wrap|Line Numbers

... FROM t1, t2 WHERE t1.f1 *= t2.f2

was used to make left join
and

Expand|Select|Wrap|Line Numbers

... FROM t1, t2 WHERE t1.f1 =* t2.f2

for right join.
In some RDBMS this syntax is accepted to provide backward compatibility but Jet SQL doesn't support it.

Regards,
Fish

May 29 '08 #24

2,545

Expert Mod 2GB

Thank you, Fish, for another very interesting reply. I live and learn - I have not myself used table joins based on inequality criteria as you have shown. You are quite right that I am looking at the relationships from a PK/FK perspective. In those terms, there is a fundamental difference in the joining of the sets involved. In a PK/FK relationship all FKs are a subset of the PK set; in a table relationship where a table join as you have shown is involved one set is not a subset of the other. Different views of the same situation - and both ways valid. Really interesting!

-Stewart

May 29 '08 #25

ASP / Active Server Pages

2,545

Expert Mod 2GB

Update on post 25: In the query editor SQL view Access accepts the table join version of the SQL just as Fish has said, but the query editor cannot then show the query in design view. The SQL table join statement does indeed work equivalently otherwise.

Expand|Select|Wrap|Line Numbers

 SELECT [Age Histogram Base].Type, [Age Histogram Base].Age, [Age Histogram Base].Gender, [Age Band].[Band], [Age Band].[Band Label]

FROM [Age Histogram Base] INNER JOIN [Age Band]

ON [Age Histogram Base].Age >= [Age Band].[Min] And [Age Histogram Base].Age <= [Age Band].[Max];

Timing update
Running a quick test of timing differences on an enlarged set of data (221k rows) the summary crosstab based on the WHERE version of the banded data took 6 seconds to run. The same crosstab based on the table-join version took just under 30 seconds to run.

On the simple Count of the banded rows both versions ran in about 6 seconds.

-Stewart

May 29 '08 #26

NeoPa

32,556

Expert Mod 16PB

Nice idea Stewart.

I'm curious as to why you wouldn't use an INNER JOIN to link the tables though?

I would expect that to work better than a simple WHERE clause.

You boys seem to have sorted all this without me (You're just too quick :D).

FYI Fish correctly interpreted exactly the point I was raising.

I would expect, conceptually, that the INNER JOIN should work more efficiently, although I accept that this may well be negligible for small amounts of locally held data. It may prove more important if linking to remote (Non-Access) tables though.

May 29 '08 #27

Similar topics

Grouping in SQL query

by: Debbie Davis | last post by:

Hi there, SQL 2000 I have the following query: SELECT sponsor, COUNT(sponsor) * 2 AS total FROM Referrals GROUP BY sponsor Works great, returns the sponsor and the total * 2 of their...

Empty report grouping

by: Brian Coy | last post by:

I am creating a database to track scrap on a daily basis at my plant. I have to provide a weekly scrap report with the amount of each part scrapped per day. I have the basic database set up, and...

A2K: Report formatting issue grouping on "Fld<>'Value'"

by: ahaque38 | last post by:

Hello. Using A2K SP3, I am having the following problem with a report using "Sorting and Grouping". I have recently added a grouping in the reports for "Category2<>'CONTRACTS'". I have...

Formatting within Access report text controls - OR - sorting and grouping in Word

by: Mike MacSween | last post by:

tblCourses one to many to tblEvents. A course may have an intro workshop (a type of event), a mid course workshop, a final exam. Or any combination. Or something different in the future. At...

Basic query question regarding parameters and grouping.

by: Regnab | last post by:

I often want to query the database with certain parameters in my query which I do not want included in the query result. The problem is when I want to group on 1, sum the 2nd (and group by) and...

Grouping with detail

by: UHelix | last post by:

Hi, I'm not really even sure how to phrase this question, so here is a sample of exactly what I want to do. In the following data set I would like to select the entry from the group of samples 1...

Access Not Grouping Records

by: craig.buchinski | last post by:

Ok, i have a Access Database (which is used with my ESRI Arcmap) and i have written a query for a report to pull out our streetlights and group them by billing wattage. The problem is that it is...

Sorting/Grouping report on column(1) of combo box

by: Jimmy | last post by:

Is there a way to sort/group a report based on the second column of a combo box, i.e. the text associated with the primary key number?

Grouping Data Range

by: h2lm2t | last post by:

Could anyone please take a look at this? I have a table with 3 columns: ID, ZIP and Count as below: Original Table ID ZIP Count 1 00001 12 2 00002 12 3 00003 ...