445,645 Members | 1,048 Online
Need help? Post your question and get tips & solutions from a community of 445,645 IT Pros & Developers. It's quick & easy.

# Grouping Query Range

 P: 39 Hi I have a query which groups Number of People together by number of days and gets the following result : Days-------- Number of People 1----------------100 2----------------150 3 ---------------- 175 4----------------200 5----------------225 6----------------75 7----------------100 However I would like to group the data like this: Days ----------- Number of People 0 - 1----------------100 2 - 3----------------325 4 - 5---------------- 425 6+----------------175 Any ideas how to do this? Thanks May 23 '08 #1
26 Replies

 Expert 5K+ P: 8,669 Hi I have a query which groups Number of People together by number of days and gets the following result : Days-------- Number of People 1----------------100 2----------------150 3 ---------------- 175 4----------------200 5----------------225 6----------------75 7----------------100 However I would like to group the data like this: Days ----------- Number of People 0 - 1----------------100 2 - 3----------------325 4 - 5---------------- 425 6+----------------175 Any ideas how to do this? Thanks Expand|Select|Wrap|Line Numbers SELECT DISTINCTROW Partition([Days],0,500,2) AS Day_Range, Sum([Number of People]) AS [People in Range] FROM  GROUP BY Partition([Days],0,500,2);   SAMPLE OUTPUT: Expand|Select|Wrap|Line Numbers Day_Range    People in Range    0:1              100    2:3               325    4:5               21   May 23 '08 #2

 P: 39 Expand|Select|Wrap|Line Numbers SELECT DISTINCTROW Partition([Days],0,500,2) AS Day_Range, Sum([Number of People]) AS [People in Range] FROM  GROUP BY Partition([Days],0,500,2);   SAMPLE OUTPUT: Expand|Select|Wrap|Line Numbers Day_Range    People in Range    0:1              100    2:3               325    4:5               21   Thanks for your help but just by looking at this i can tell its not going to work!! I assume this is not a simple process? Any one any other ideas? May 26 '08 #3

 Expert 5K+ P: 8,669 Thanks for your help but just by looking at this i can tell its not going to work!! I assume this is not a simple process? Any one any other ideas? Actually, it is a very simple process. Copy and Paste the SQL Statement into the SQL View of a Query, substituting your Table name for the demo. Just for my own curiosity, what makes you say it won't work even though you haven't tried it? May 26 '08 #4

 P: 39 Actually, it is a very simple process. Copy and Paste the SQL Statement into the SQL View of a Query, substituting your Table name for the demo. Just for my own curiosity, what makes you say it won't work even though you haven't tried it? Sorry, I was in a bit of a mood with SQL at the time I replied!! But what makes me think it wont work is the 6 and over field. I assume yr solution only allows for fixed ranges. 0-1, 1-2, 3-4. I really want 0, 1, 2, 3 - 5 and 6 and over. Will your solutioin give me this? May 27 '08 #5

 Expert 5K+ P: 8,669 Sorry, I was in a bit of a mood with SQL at the time I replied!! But what makes me think it wont work is the 6 and over field. I assume yr solution only allows for fixed ranges. 0-1, 1-2, 3-4. I really want 0, 1, 2, 3 - 5 and 6 and over. Will your solutioin give me this? Sorry, I misinterpreted your request. You are correct in that my solution will only allow for a fixed range with an interval of 1. I do not think that there is an SQL based solution to your problem, but then again, a VBA based solution should not be that difficult. SQL is not my strong point, but please be patient and I'll call in some of the more proficient SQL experts in on this problem. May 27 '08 #6

 Expert 2.5K+ P: 2,653 Hi, JC. Partition function will return value w/o right bound whenever its Number argument higher then its stop argument example ? Partition(1,0,5,2) 0: 1 ? Partition(2,0,5,2) 2: 3 ? Partition(5,0,5,2) 4: 5 ? Partition(6,0,5,2) 6: ? Partition(16,0,5,2) 6: You may reformat Partition() function output to what you like or write your own VBA function doing the same. Additionally there is a pure SQL solution but it requires additional table. Regards, Fish May 27 '08 #7

 P: 39 Hi, JC. Partition function will return value w/o right bound whenever its Number argument higher then its stop argument example ? Partition(1,0,5,2) 0: 1 ? Partition(2,0,5,2) 2: 3 ? Partition(5,0,5,2) 4: 5 ? Partition(6,0,5,2) 6: ? Partition(16,0,5,2) 6: You may reformat Partition() function output to what you like or write your own VBA function doing the same. Additionally there is a pure SQL solution but it requires additional table. Regards, Fish OK! Thanks for all your help! Ive managed to solve the problem by having a query with 6 subqueries in it! Seems to work fine! Thanks for all your help May 27 '08 #8

Expert Mod 2.5K+
P: 2,545
Hi. The most flexible way to do what you ask is as Fish indicated - use a separate band table. That way you can use bands with unequal boundaries (as illustrated in the attached zipped database).

The banding table is added to the base data without using INNER, LEFT or RIGHT JOINs; instead, the rows of the banding table are selected where the value being compared is between the band minimum and maximum. The query with the joined banding can then be used as the basis for crosstabs, simple counts, histograms and so on.

I attach in zip form a sample database (A2003 format) which provides a simple example of age banding data. The age band table is used in the queries to provide the base data for histograms.

The extract is age, category and gender of a set of students, and illustrates in cut-down form how such banding works in practice. I use this technique (on a larger dataset) to perform age-related analysis of student data by gender, course type, location, campus, course, qualification and so on - all based on a single overarching query providing a view of the student data banded by age band.

-Stewart
Attached Files
 Histogram.zip (19.4 KB, 293 views)
May 27 '08 #9

 Expert 5K+ P: 8,669 Hi. The most flexible way to do what you ask is as Fish indicated - use a separate band table. That way you can use bands with unequal boundaries (as illustrated in the attached zipped database). The banding table is added to the base data without using INNER, LEFT or RIGHT JOINs; instead, the rows of the banding table are selected where the value being compared is between the band minimum and maximum. The query with the joined banding can then be used as the basis for crosstabs, simple counts, histograms and so on. I attach in zip form a sample database (A2003 format) which provides a simple example of age banding data. The age band table is used in the queries to provide the base data for histograms. The extract is age, category and gender of a set of students, and illustrates in cut-down form how such banding works in practice. I use this technique (on a larger dataset) to perform age-related analysis of student data by gender, course type, location, campus, course, qualification and so on - all based on a single overarching query providing a view of the student data banded by age band. -Stewart Hello and thanks for the reply, Stewart. I'd like to see that Attached DB whenever you get a chance. May 27 '08 #10

 Expert Mod 2.5K+ P: 2,545 Should be visible now, ADezii - if not please let me know. -Stewart Hello and thanks for the reply, Stewart. I'd like to see that Attached DB whenever you get a chance. May 27 '08 #11

 Expert 5K+ P: 8,669 Should be visible now, ADezii - if not please let me know. -Stewart I know I'm getting older, Stewart, but I still can't see anything. No comment necessary, NeoPa! (LOL). May 27 '08 #12

 Expert Mod 15k+ P: 31,656 Subscribing . May 27 '08 #13

 Expert Mod 2.5K+ P: 2,545 Hi ADezii. It's attached on post #9 - at least I can see it and it is listed as one view, so someone else can too... By the way, although the example uses a Count query any form of totalling can be done on the banded data - Sum (which the examples in post 1 would require), Average, Min, Max, whatever. -Stewart I know I'm getting older, Stewart, but I still can't see anything. No comment necessary, NeoPa! (LOL). May 27 '08 #14

 Expert 5K+ P: 8,669 Hi ADezii. It's attached on post #9 - at least I can see it and it is listed as one view, so someone else can too... By the way, although the example uses a Count query any form of totalling can be done on the banded data - Sum (which the examples in post 1 would require), Average, Min, Max, whatever. -Stewart Thanks Stewart for the explanation and Attachment, interesting concept. May 28 '08 #15

 Expert Mod 15k+ P: 31,656 Hi. The most flexible way to do what you ask is as Fish indicated - use a separate band table. That way you can use bands with unequal boundaries (as illustrated in the attached zipped database). The banding table is added to the base data without using INNER, LEFT or RIGHT JOINs; instead, the rows of the banding table are selected where the value being compared is between the band minimum and maximum. ... Nice idea Stewart. I'm curious as to why you wouldn't use an INNER JOIN to link the tables though? I would expect that to work better than a simple WHERE clause. May 28 '08 #16

 Expert Mod 2.5K+ P: 2,545 Hi NeoPa. Reason for no joins is that there are no common fields involved. The comparison is of a single discrete value in one table against a minimum to maximum range of values in the banding table, not key against key. It is the principle of the histogram, where discrete values of a population are grouped over specified intervals into separate but contiguous bands, with counts for each band subsequently displayed in the typical column chart form. This kind of banding application is one of those rare occasions when such a Cartesian join between tables is essential, with the inherent multiplication of rows stopped by the banding range always matching just one of the rows in the banding table. The reason a band number is included is to be able to order the bands correctly if required - and to be able to left-join the banding table back to the results later if necessary (because, depending on the range of the data, the result bandings are likely to be non-contiguous, and left-joining the bandings back will allow inclusion of all bands, even if there are no results for particular bands). Hope this helps to explain the approach. -Stewart May 28 '08 #17

 Expert 2.5K+ P: 2,653 I agree with NeoPa. I would prefer to JOIN tables though with only two tables involved it seems not to have a great advantage. Regards, Fish May 28 '08 #18

 Expert Mod 2.5K+ P: 2,545 I agree entirely, Fish. When working with relational tables there are always joins involved, and in all of my applications I define the joins and set relational constraints as a matter of routine. However, the banding application is one of these rare occasions when this is simply not possible - in set algebra terms the sets involved are disjoint and cannot be equijoined or outer joined. When I analyse several years worth of student data, involving around 30,000 rows, the time penalties and delays are in the subsequent multiple-field grouping and summations not in the formation of a banded base table involving joining of the 30,000 rows to a small table of around 8 rows using a where clause to restrict the rows joined. -Stewart May 28 '08 #19

 Expert 2.5K+ P: 2,653 Well. As I could recall, JOIN Expand|Select|Wrap|Line Numbers ... FROM t1 ... JOIN t2 ON blnExpr   is a later SQL syntax improvement for cartesian product filtering Expand|Select|Wrap|Line Numbers ... FROM t1, t2 WHERE blnExpr   The only difference known to me is that Expr from the first snippet must be evaluated on fields from both tables being joined. All the rest seems to be allowed - any kind and combination of arithmetic and logic operations, VBA calls etc. So long it turns into boolean. Advantage of JOIN syntax is that it allows outer joins and explicitely define the order of joining. In context of OP's question both syntaxes performs more or less equally. Regards, Fish P.S. Did you mean something else? May 28 '08 #20

 Expert Mod 2.5K+ P: 2,545 Hi Fish, and thanks for a very thoughtful reply. What I was referring to is that for the most part Cartesian products are avoided, and seen as 'a bad thing', because of the row multiplication involved, but there are occasions when no other approach will do. The lack of an explicit join could be detrimental in some circumstances, but as your reply clarifies the JOIN and WHERE approaches have similar roots and will perform similarly given the small scale of the banding table involved. Where I am coming from is that the use of a Cartesian product in this case is quite deliberate and intentional, as it is dictated by the disjoint nature of the datasets - a very different case to the accidental creation of a cross-multipled table for related data by neglecting to join two tables correctly. One other difference in the JOIN and WHERE approaches is that with explicit JOINs the database engine query optimisers can avoid projecting the Cartesian product to begin with, thereby operating more efficiently than applying a WHERE clause after the event might do - depending on how efficient the underlying operations and the optimising algorithms really are (which we can only infer from overall performance, as the optimisations are not visible or alterable by us). Thank you again for your thoughtful reply which provides further explanation of the JOIN vs WHERE approaches. Really interesting, even if we are drifting a bit from the original thread! -Stewart May 28 '08 #21

 Expert 2.5K+ P: 2,653 ..... Where I am coming from is that the use of a Cartesian product in this case is quite deliberate and intentional, as it is dictated by the disjoint nature of the datasets - a very different case to the accidental creation of a cross-multipled table for related data by neglecting to join two tables correctly. ..... Sorry, Stewart. What "disjoint nature" is all about? Just couldn't understand why it does "dictate". ;) Regards, Fish May 28 '08 #22

 Expert Mod 2.5K+ P: 2,545 Hi Fish. Disjoint is a term from set theory and just means that there are no elements in common. Set theory underlies the relational model developed by EF Codd in the 60s, updated by him since and (imperfectly) implemented in the various standards for SQL ever since. When there is a relation between tables T1 and T2 the underlying sets have at least one element of overlap. There is no such overlap in the case of a range table (where each row represents a paired minimum to maximum range), and the data which contains discrete values (not min and max tuples). The sets are disjoint, therefore no common fields, so no way to join them on a relationship between fields. To illustrate: age part of student data set: {16, 19, 18, 18, 15, 64, 20,...,32} Part of range set: {{14, 15}, {16, 17}, {18, 21},...{61, 100}} The sets are disjoint, so no value-based join is possible. If the product of the set of N values with a range set of M values was created there would be N x M rows in the result - the Cartesian product. But there is a functional relationship between ages and age ranges, which is that the discrete value will match one of the tuples where its value is between the values of the tuple elements - so although no 1-1 join is possible the two datasets of N x M rows can still be combined to give just N rows for the result by using the WHERE clause to find the correct range. There is a more formal treatment of disjoint sets at Wikipedia - linked here. I hope this helps. -Stewart Sorry, Stewart. What "disjoint nature" is all about? Just couldn't understand why it does "dictate". ;) Regards, Fish May 29 '08 #23

 Expert 2.5K+ P: 2,653 Ok, Stewart. You have not convinced me though. What you are talking about is table PK/FK relations based on key equalness. Table join, on the other hand, is a tool to combine records on some criteria regardless of PK/FK relationships. Sure it is commonly used to get denormalized view of normalized tables, but the area of table join application is definitely not limited by this. Expressions like Expand|Select|Wrap|Line Numbers SELECT * FROM t1, t2 WHERE t1.f1>t2.UpperBound AND t1.f1t2.UpperBound AND t1.f1

 Expert Mod 2.5K+ P: 2,545 Thank you, Fish, for another very interesting reply. I live and learn - I have not myself used table joins based on inequality criteria as you have shown. You are quite right that I am looking at the relationships from a PK/FK perspective. In those terms, there is a fundamental difference in the joining of the sets involved. In a PK/FK relationship all FKs are a subset of the PK set; in a table relationship where a table join as you have shown is involved one set is not a subset of the other. Different views of the same situation - and both ways valid. Really interesting! -Stewart May 29 '08 #25

 Expert Mod 2.5K+ P: 2,545 Update on post 25: In the query editor SQL view Access accepts the table join version of the SQL just as Fish has said, but the query editor cannot then show the query in design view. The SQL table join statement does indeed work equivalently otherwise. Expand|Select|Wrap|Line Numbers SELECT [Age Histogram Base].Type, [Age Histogram Base].Age, [Age Histogram Base].Gender, [Age Band].[Band], [Age Band].[Band Label] FROM [Age Histogram Base] INNER JOIN [Age Band] ON [Age Histogram Base].Age >= [Age Band].[Min] And [Age Histogram Base].Age <= [Age Band].[Max]; Timing update Running a quick test of timing differences on an enlarged set of data (221k rows) the summary crosstab based on the WHERE version of the banded data took 6 seconds to run. The same crosstab based on the table-join version took just under 30 seconds to run. On the simple Count of the banded rows both versions ran in about 6 seconds. -Stewart May 29 '08 #26

 Expert Mod 15k+ P: 31,656 Nice idea Stewart. I'm curious as to why you wouldn't use an INNER JOIN to link the tables though? I would expect that to work better than a simple WHERE clause. You boys seem to have sorted all this without me (You're just too quick :D). FYI Fish correctly interpreted exactly the point I was raising. I would expect, conceptually, that the INNER JOIN should work more efficiently, although I accept that this may well be negligible for small amounts of locally held data. It may prove more important if linking to remote (Non-Access) tables though. May 29 '08 #27