How to group date periods together ? | | |
[DB2 UDB LUW 8.2.4] I've got bookings per day in a table:
bkgdate bookings company
1/1/06 50 A
2/1/06 45 A
3/1/06 55 A
4/1/06 40 A
5/1/06 30 A
6/1/06 45 A
7/1/06 35 A
....
I want to segment this into blocks of days where there are at least 100
bookings in each block.
So the solution I want is along the lines of
date_segment startdate enddate (bookings)
1 1/1/06 3/1/06 150
2 4/1/06 6/1/06 115
....
So far I've managed to do
SELECT bkgdate, company,
sum(bookings) over (partition by company order by bkgdate) as
cume_bookings
FROM table1
which gives me
bkgdate cume_bookings
1/1/06 50
2/1/06 95
3/1/06 150
4/1/06 190
....
I was thinking about joining this result set to itself, but that
doesn't quite give me the results I want, and the SQL is getting more
and more convoluted. Is there a more obvious way to do this that I'm
missing?
Thanks
JCSJF | | | | re: How to group date periods together ?
I got the result this way. But, I feel it too complex. Althogh, I tried
to simlify or modify some part of this code. I failed to simplify any
more. Are there anyone who could show more simple or easily
understandable way?
WITH Cum_bookings AS (
SELECT bkgdate
, bookings
, SUM(bookings) OVER(ORDER BY bkgdate
ROWS BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW
) AS bookings_cum
, ROWNUMBER() OVER(ORDER BY bkgdate) AS rn
FROM Bookings
)
,Ranges AS (
SELECT
MIN(startdate) AS startdate
, MIN(enddate) AS enddate
, MIN(bookings_cum) AS bookings_cum
, MIN(Lrn) AS Lrn
, Hrn
FROM (SELECT MIN(L.bkgdate) AS startdate
, MIN(H.bkgdate) AS enddate
, MIN( H.bookings_cum
- L.bookings_cum
+ L.bookings ) AS bookings_cum
, L.rn AS Lrn
, MIN(H.rn) AS Hrn
FROM Cum_bookings L
, Cum_bookings H
WHERE L.rn < H.rn
AND ( H.bookings_cum
- L.bookings_cum
+ L.bookings ) >= 100
GROUP BY
L.rn
) L
GROUP BY Hrn
)
SELECT INT(ROWNUMBER() OVER(ORDER BY startdate)) AS date_segment
, startdate
, enddate
, bookings_cum AS "(bookings)"
FROM Ranges R
WHERE EXISTS
(SELECT *
FROM Ranges E
WHERE E.Lrn = R.Hrn + 1)
OR EXISTS
(SELECT *
FROM Ranges E
WHERE E.Hrn = R.Lrn - 1)
;
--------------------------------------------------------------------
DATE_SEGMENT STARTDATE ENDDATE (bookings)
------------ ---------- ---------- -----------
1 2006-01-01 2006-03-01 150
2 2006-04-01 2006-06-01 115
2 record(s) selected. | | | | re: How to group date periods together ?
Can we start by putting the date in the proper format?
[color=blue][color=green]
>> bkgdate bookings company[/color][/color]
2006-01-01 50 A
2006-02-01 45 A
2006-03-01 55 A
2006-04-01 40 A
2006-05-01 30 A
2006-06-01 45 A
2006-07-01 35 A
....
I want to segment this into blocks of days where there are at least 100
bookings in each block. <<
What do you do about overlaps and subsets?
2006-01-01 2006-03-01 150
2006-02-01 2006-03-01 100 <== subset of first row
2006-03-01 2006-05-01 125
2006-04-01 2006-06-01 115 <== overlaps following row
2006-05-01 2006-07-01 110
etc.
Likewise, we could get "super blocks" like ('2006-01-01', '2006-04-01')
which meet the criteria, but are not minimal.
This seems like a good candidate for a SUM(bookings)
OVER (PARTITION BY company ORDER BY bkgdate ROWS <<magic range clause>>
FOLLOWING)
but I am having trouble with my magic :) Can we get a better spec? | | | | re: How to group date periods together ?
>> I got the result this way. But, I feel it too complex. <<
I thought it was quite clever myself!
I was playing with the idea of setting up all possible partitions in a
table, then seeing which one (if any) met the summation criteria.
Something like this:
bkgdate part_grp part_nbr
=======================
2006-01-01 1 1
2006-02-01 1 1
2006-03-01 1 1
2006-04-01 1 2
2006-05-01 1 2
2006-06-01 1 2
2006-07-01 1 2
2006-01-01 2 1
2006-02-01 2 1
2006-03-01 2 1
2006-04-01 2 1
2006-05-01 2 2
2006-06-01 2 2
2006-07-01 2 3
Wow! That would get big really fast! | | | | re: How to group date periods together ?
Previous my Query was wrong.
Here is right(I hope) Query using Recirsive query.
WITH Nbr_bookings AS (
SELECT bkgdate
, bookings
, INT(ROWNUMBER() OVER(ORDER BY bkgdate)) AS rn
FROM Bookings
)
,Recurse
(rn, date_segment, startdate, enddate, bookings_sum,
next_segment_flag) AS (
SELECT rn
, 1
, bkgdate
, bkgdate
, bookings
, 0
FROM Nbr_bookings
WHERE rn = 1
UNION ALL
SELECT new_rn
, CASE
WHEN next_segment_flag = 1 THEN
date_segment + 1
ELSE date_segment
END
, CASE
WHEN next_segment_flag = 1 THEN
bkgdate
ELSE startdate
END
, bkgdate
, CASE
WHEN next_segment_flag = 1 THEN
bookings
ELSE bookings_sum + bookings
END
, new_segment_flag
FROM (SELECT pre.rn + 1 AS new_rn
, pre.date_segment
, pre.startdate
, pre.enddate
, pre.bookings_sum
, pre.next_segment_flag
, new.*
, CASE
WHEN pre.next_segment_flag = 0
AND ( pre.bookings_sum
+ new.bookings ) >= 100
OR new.bookings >= 100 THEN
1
ELSE 0
END AS new_segment_flag
FROM Recurse pre
, Nbr_bookings new
WHERE pre.rn < 10000
AND pre.rn + 1 = new.rn
) Q
)
SELECT date_segment
, startdate
, enddate
, bookings_sum AS "(bookings)"
FROM Recurse
WHERE next_segment_flag = 1
;
------------------------------------------------------------------------------
DATE_SEGMENT STARTDATE ENDDATE (bookings)
------------ ---------- ---------- -----------
1 2006-01-01 2006-03-01 150
2 2006-04-01 2006-06-01 115
2 record(s) selected. | | | | re: How to group date periods together ?
This also seems work well.
WITH Cum_bookings AS (
SELECT bkgdate
, bookings
, SUM(bookings) OVER(ORDER BY bkgdate
ROWS BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW
) AS bookings_cum
, ROWNUMBER() OVER(ORDER BY bkgdate) AS rn
FROM Bookings
)
,Ranges AS (
SELECT
L.rn AS Lrn
, MIN(H.rn) AS Hrn
FROM Cum_bookings L
, Cum_bookings H
WHERE L.rn <= H.rn
AND ( H.bookings_cum
- L.bookings_cum
+ L.bookings ) >= 100
GROUP BY
L.rn
)
SELECT INT(ROWNUMBER() OVER(ORDER BY Lrn)) AS date_segment
, L.bkgdate AS startdate
, H.bkgdate AS enddate
, ( H.bookings_cum
- L.bookings_cum
+ L.bookings ) AS "(bookings)"
FROM Cum_bookings L
, Cum_bookings H
, (SELECT
MAX(Lrn) AS Lrn
, Hrn
FROM Ranges R
WHERE
(Lrn = 1
OR
EXISTS
(SELECT *
FROM Ranges E
WHERE E.Hrn = R.Lrn - 1)
)
AND
(Hrn = (SELECT MAX(Hrn) FROM Ranges)
OR
EXISTS
(SELECT *
FROM Ranges E
WHERE E.Lrn = R.Hrn + 1)
)
GROUP BY
Hrn
) S
WHERE L.rn = S.Lrn
AND H.rn = S.Hrn
;
--------------------------------------------------------------------
DATE_SEGMENT STARTDATE ENDDATE (bookings)
------------ ---------- ---------- -----------
1 2006-01-01 2006-03-01 150
2 2006-04-01 2006-06-01 115
2 record(s) selected. | | | | re: How to group date periods together ?
What do you do about overlaps and subsets? Quote:
>
2006-01-01 2006-03-01 150
2006-02-01 2006-03-01 100 <== subset of first row
2006-03-01 2006-05-01 125
2006-04-01 2006-06-01 115 <== overlaps following row
2006-05-01 2006-07-01 110
etc.
>
Likewise, we could get "super blocks" like ('2006-01-01', '2006-04-01')
which meet the criteria, but are not minimal.
>
This seems like a good candidate for a SUM(bookings)
OVER (PARTITION BY company ORDER BY bkgdate ROWS <<magic range clause>>
FOLLOWING)
>
but I am having trouble with my magic :) Can we get a better spec?
Sorry. As in all things, seemed a bit simpler than it was. I want
disjoint spanning groups, with each group being minimal. | | | | re: How to group date periods together ?
As in all things, seemed a bit simpler than it was.
I agree. At least, my previous posts used a little complex algorithm or
extra code. Quote:
I want disjoint spanning groups, with each group being minimal.
But, this problem is not so simple. Because it is neccesary to remove
overlaps, sub-sets as CELKO wrote. And all data(rows) need to be
included in a group except last some rows if sum of bookings in the
group is less than 100.
Here is another example:
WITH Sum_bookings AS (
SELECT L.bkgdate AS startdate
, H.bkgdate AS enddate
, SUM(S.bookings) AS bookings_sum
FROM Bookings L
, Bookings H
, Bookings S
WHERE L.bkgdate <= H.bkgdate
AND S.bkgdate BETWEEN L.bkgdate AND H.bkgdate
GROUP BY
L.bkgdate, H.bkgdate
HAVING SUM(S.bookings) >= 100
)
,Recurse (date_segment, startdate, enddate, bookings_sum) AS (
SELECT 1
, startdate
, enddate
, bookings_sum
FROM Sum_bookings
WHERE startdate
= (SELECT MIN(bkgdate) FROM bookings)
AND enddate
= (SELECT MIN(enddate) FROM Sum_bookings)
UNION ALL
SELECT pre.date_segment + 1
, new.startdate
, new.enddate
, new.bookings_sum
FROM Recurse pre
, Sum_bookings new
WHERE pre.date_segment < 100000
AND new.startdate
= (SELECT MIN(startdate)
FROM Sum_bookings M
WHERE M.startdate pre.enddate
)
AND new.enddate
= (SELECT MIN(enddate)
FROM Sum_bookings M
WHERE M.startdate pre.enddate
)
)
SELECT * FROM Recurse; |  | | | | /bytes/about
We are a network of experts and professionals in IT and software development that help one another with answers to tough questions and share insights.
Get the best answers to your questions from over 226,471 network members.
|