Connecting Tech Pros Worldwide Forums | Help | Site Map

How to group date periods together ?

James Conrad StJohn Foreman
Guest
 
Posts: n/a
#1: Jun 29 '06
[DB2 UDB LUW 8.2.4] I've got bookings per day in a table:

bkgdate bookings company
1/1/06 50 A
2/1/06 45 A
3/1/06 55 A
4/1/06 40 A
5/1/06 30 A
6/1/06 45 A
7/1/06 35 A
....

I want to segment this into blocks of days where there are at least 100
bookings in each block.

So the solution I want is along the lines of

date_segment startdate enddate (bookings)
1 1/1/06 3/1/06 150
2 4/1/06 6/1/06 115
....

So far I've managed to do

SELECT bkgdate, company,
sum(bookings) over (partition by company order by bkgdate) as
cume_bookings
FROM table1

which gives me

bkgdate cume_bookings
1/1/06 50
2/1/06 95
3/1/06 150
4/1/06 190
....

I was thinking about joining this result set to itself, but that
doesn't quite give me the results I want, and the SQL is getting more
and more convoluted. Is there a more obvious way to do this that I'm
missing?

Thanks


JCSJF


Tonkuma
Guest
 
Posts: n/a
#2: Jun 30 '06

re: How to group date periods together ?


I got the result this way. But, I feel it too complex. Althogh, I tried
to simlify or modify some part of this code. I failed to simplify any
more. Are there anyone who could show more simple or easily
understandable way?

WITH Cum_bookings AS (
SELECT bkgdate
, bookings
, SUM(bookings) OVER(ORDER BY bkgdate
ROWS BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW
) AS bookings_cum
, ROWNUMBER() OVER(ORDER BY bkgdate) AS rn
FROM Bookings
)
,Ranges AS (
SELECT
MIN(startdate) AS startdate
, MIN(enddate) AS enddate
, MIN(bookings_cum) AS bookings_cum
, MIN(Lrn) AS Lrn
, Hrn
FROM (SELECT MIN(L.bkgdate) AS startdate
, MIN(H.bkgdate) AS enddate
, MIN( H.bookings_cum
- L.bookings_cum
+ L.bookings ) AS bookings_cum
, L.rn AS Lrn
, MIN(H.rn) AS Hrn
FROM Cum_bookings L
, Cum_bookings H
WHERE L.rn < H.rn
AND ( H.bookings_cum
- L.bookings_cum
+ L.bookings ) >= 100
GROUP BY
L.rn
) L
GROUP BY Hrn
)
SELECT INT(ROWNUMBER() OVER(ORDER BY startdate)) AS date_segment
, startdate
, enddate
, bookings_cum AS "(bookings)"
FROM Ranges R
WHERE EXISTS
(SELECT *
FROM Ranges E
WHERE E.Lrn = R.Hrn + 1)
OR EXISTS
(SELECT *
FROM Ranges E
WHERE E.Hrn = R.Lrn - 1)
;
--------------------------------------------------------------------

DATE_SEGMENT STARTDATE ENDDATE (bookings)
------------ ---------- ---------- -----------
1 2006-01-01 2006-03-01 150
2 2006-04-01 2006-06-01 115

2 record(s) selected.

--CELKO--
Guest
 
Posts: n/a
#3: Jun 30 '06

re: How to group date periods together ?


Can we start by putting the date in the proper format?
[color=blue][color=green]
>> bkgdate bookings company[/color][/color]
2006-01-01 50 A
2006-02-01 45 A
2006-03-01 55 A
2006-04-01 40 A
2006-05-01 30 A
2006-06-01 45 A
2006-07-01 35 A
....
I want to segment this into blocks of days where there are at least 100
bookings in each block. <<

What do you do about overlaps and subsets?

2006-01-01 2006-03-01 150
2006-02-01 2006-03-01 100 <== subset of first row
2006-03-01 2006-05-01 125
2006-04-01 2006-06-01 115 <== overlaps following row
2006-05-01 2006-07-01 110
etc.

Likewise, we could get "super blocks" like ('2006-01-01', '2006-04-01')
which meet the criteria, but are not minimal.

This seems like a good candidate for a SUM(bookings)
OVER (PARTITION BY company ORDER BY bkgdate ROWS <<magic range clause>>
FOLLOWING)

but I am having trouble with my magic :) Can we get a better spec?

--CELKO--
Guest
 
Posts: n/a
#4: Jun 30 '06

re: How to group date periods together ?


>> I got the result this way. But, I feel it too complex. <<

I thought it was quite clever myself!

I was playing with the idea of setting up all possible partitions in a
table, then seeing which one (if any) met the summation criteria.
Something like this:

bkgdate part_grp part_nbr
=======================
2006-01-01 1 1
2006-02-01 1 1
2006-03-01 1 1
2006-04-01 1 2
2006-05-01 1 2
2006-06-01 1 2
2006-07-01 1 2

2006-01-01 2 1
2006-02-01 2 1
2006-03-01 2 1
2006-04-01 2 1
2006-05-01 2 2
2006-06-01 2 2
2006-07-01 2 3

Wow! That would get big really fast!

Tonkuma
Guest
 
Posts: n/a
#5: Jul 1 '06

re: How to group date periods together ?


Previous my Query was wrong.
Here is right(I hope) Query using Recirsive query.

WITH Nbr_bookings AS (
SELECT bkgdate
, bookings
, INT(ROWNUMBER() OVER(ORDER BY bkgdate)) AS rn
FROM Bookings
)
,Recurse
(rn, date_segment, startdate, enddate, bookings_sum,
next_segment_flag) AS (
SELECT rn
, 1
, bkgdate
, bkgdate
, bookings
, 0
FROM Nbr_bookings
WHERE rn = 1
UNION ALL
SELECT new_rn
, CASE
WHEN next_segment_flag = 1 THEN
date_segment + 1
ELSE date_segment
END
, CASE
WHEN next_segment_flag = 1 THEN
bkgdate
ELSE startdate
END
, bkgdate
, CASE
WHEN next_segment_flag = 1 THEN
bookings
ELSE bookings_sum + bookings
END
, new_segment_flag
FROM (SELECT pre.rn + 1 AS new_rn
, pre.date_segment
, pre.startdate
, pre.enddate
, pre.bookings_sum
, pre.next_segment_flag
, new.*
, CASE
WHEN pre.next_segment_flag = 0
AND ( pre.bookings_sum
+ new.bookings ) >= 100
OR new.bookings >= 100 THEN
1
ELSE 0
END AS new_segment_flag
FROM Recurse pre
, Nbr_bookings new
WHERE pre.rn < 10000
AND pre.rn + 1 = new.rn
) Q
)
SELECT date_segment
, startdate
, enddate
, bookings_sum AS "(bookings)"
FROM Recurse
WHERE next_segment_flag = 1
;
------------------------------------------------------------------------------

DATE_SEGMENT STARTDATE ENDDATE (bookings)
------------ ---------- ---------- -----------
1 2006-01-01 2006-03-01 150
2 2006-04-01 2006-06-01 115

2 record(s) selected.

Tonkuma
Guest
 
Posts: n/a
#6: Jul 3 '06

re: How to group date periods together ?


This also seems work well.

WITH Cum_bookings AS (
SELECT bkgdate
, bookings
, SUM(bookings) OVER(ORDER BY bkgdate
ROWS BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW
) AS bookings_cum
, ROWNUMBER() OVER(ORDER BY bkgdate) AS rn
FROM Bookings
)
,Ranges AS (
SELECT
L.rn AS Lrn
, MIN(H.rn) AS Hrn
FROM Cum_bookings L
, Cum_bookings H
WHERE L.rn <= H.rn
AND ( H.bookings_cum
- L.bookings_cum
+ L.bookings ) >= 100
GROUP BY
L.rn
)
SELECT INT(ROWNUMBER() OVER(ORDER BY Lrn)) AS date_segment
, L.bkgdate AS startdate
, H.bkgdate AS enddate
, ( H.bookings_cum
- L.bookings_cum
+ L.bookings ) AS "(bookings)"
FROM Cum_bookings L
, Cum_bookings H
, (SELECT
MAX(Lrn) AS Lrn
, Hrn
FROM Ranges R
WHERE
(Lrn = 1
OR
EXISTS
(SELECT *
FROM Ranges E
WHERE E.Hrn = R.Lrn - 1)
)
AND
(Hrn = (SELECT MAX(Hrn) FROM Ranges)
OR
EXISTS
(SELECT *
FROM Ranges E
WHERE E.Lrn = R.Hrn + 1)
)
GROUP BY
Hrn
) S
WHERE L.rn = S.Lrn
AND H.rn = S.Hrn
;
--------------------------------------------------------------------

DATE_SEGMENT STARTDATE ENDDATE (bookings)
------------ ---------- ---------- -----------
1 2006-01-01 2006-03-01 150
2 2006-04-01 2006-06-01 115

2 record(s) selected.

James Conrad StJohn Foreman
Guest
 
Posts: n/a
#7: Jul 6 '06

re: How to group date periods together ?


What do you do about overlaps and subsets?
Quote:
>
2006-01-01 2006-03-01 150
2006-02-01 2006-03-01 100 <== subset of first row
2006-03-01 2006-05-01 125
2006-04-01 2006-06-01 115 <== overlaps following row
2006-05-01 2006-07-01 110
etc.
>
Likewise, we could get "super blocks" like ('2006-01-01', '2006-04-01')
which meet the criteria, but are not minimal.
>
This seems like a good candidate for a SUM(bookings)
OVER (PARTITION BY company ORDER BY bkgdate ROWS <<magic range clause>>
FOLLOWING)
>
but I am having trouble with my magic :) Can we get a better spec?
Sorry. As in all things, seemed a bit simpler than it was. I want
disjoint spanning groups, with each group being minimal.

Tonkuma
Guest
 
Posts: n/a
#8: Jul 9 '06

re: How to group date periods together ?


As in all things, seemed a bit simpler than it was.
I agree. At least, my previous posts used a little complex algorithm or
extra code.
Quote:
I want disjoint spanning groups, with each group being minimal.
But, this problem is not so simple. Because it is neccesary to remove
overlaps, sub-sets as CELKO wrote. And all data(rows) need to be
included in a group except last some rows if sum of bookings in the
group is less than 100.

Here is another example:
WITH Sum_bookings AS (
SELECT L.bkgdate AS startdate
, H.bkgdate AS enddate
, SUM(S.bookings) AS bookings_sum
FROM Bookings L
, Bookings H
, Bookings S
WHERE L.bkgdate <= H.bkgdate
AND S.bkgdate BETWEEN L.bkgdate AND H.bkgdate
GROUP BY
L.bkgdate, H.bkgdate
HAVING SUM(S.bookings) >= 100
)
,Recurse (date_segment, startdate, enddate, bookings_sum) AS (
SELECT 1
, startdate
, enddate
, bookings_sum
FROM Sum_bookings
WHERE startdate
= (SELECT MIN(bkgdate) FROM bookings)
AND enddate
= (SELECT MIN(enddate) FROM Sum_bookings)
UNION ALL
SELECT pre.date_segment + 1
, new.startdate
, new.enddate
, new.bookings_sum
FROM Recurse pre
, Sum_bookings new
WHERE pre.date_segment < 100000
AND new.startdate
= (SELECT MIN(startdate)
FROM Sum_bookings M
WHERE M.startdate pre.enddate
)
AND new.enddate
= (SELECT MIN(enddate)
FROM Sum_bookings M
WHERE M.startdate pre.enddate
)
)
SELECT * FROM Recurse;

Closed Thread