[posted and mailed]
Thomas R. Hummel (to********@hotmail.com) writes:
The rule is basically this... for any given period of time, for a
particular coverage, always use the coverage with the highest rate. So,
given the rows above, I would want the results to be:
coverage_id start_dt end_dt rate
----------- ---------- ---------- --------
1 2004-01-01 2004-06-01 40.00
1 2004-06-01 2004-08-01 30.00
2 2004-01-01 9999-12-31 90.00
Below are two variants. Since you objected to Celko's use of a calendar
table, I thought a little about and realised that you don't need a full
calendar table. You only need one with the actual dates in the table.
This is why there are two alternatives. The first builds a calendar by
coverage_id; the second builds a global calendar. The latter is probably
the better bet, if start and end dates are typically at the start of
months, and thus shared by many coverage ids. The latter solution
supplements the input with sentinel rows with a zero rate from 1900-01-01
to 9999-1231 for each coverage id. (The eager student who want to know
why, can get a hint by looking at the code for the first alternive.)
The calendar itself appears only as a derived table. However, there is a
temp table to hold intermediate results. Just to tease Celko :-) there is
an IDENTITY column in it, but it serves to make the next DELETE a little
easier. In SQL 2005, I would have used the ROW_NUMBER() function instead.
I've also added indexes on the table, with my idea of which will give
the best performance.
Note also that the final SELECT where we return the data, comes in two
different variations. They are not related to the choice of whether
the calendar is global or by coverage_id.
How well this scales, I have no idea. Thomas who have the data will
have to test that part. :-) There are a few more possible variations
on queries, that I hope Thomas will be able to work out, using the
queries as a starting point.
It goes without saying that there may be ugly cases I have overlooked.
I did check one thing: if there is a gap for a coverage_id, this is
handled well. Such a case is added to the repro below.
Finally, looking at Thomas's desired result, I realise that I may
be one day off with regards to end_date. I believe Thomas will be
able to handle that as well.
CREATE TABLE Coverage_Rates (
rate_id INT IDENTITY NOT NULL,
coverage_id INT NOT NULL,
start_date DATETIME NOT NULL,
end_date DATETIME NOT NULL,
rate MONEY NOT NULL )
GO
INSERT INTO Coverage_Rates VALUES (1, '2004-01-01', '2004-06-01', 40.00)
INSERT INTO Coverage_Rates VALUES (1, '2004-03-01', '2004-08-01', 20.00)
INSERT INTO Coverage_Rates VALUES (1, '2004-06-01', '2004-08-01', 30.00)
INSERT INTO Coverage_Rates VALUES (1, '2004-09-01', '2004-10-01', 25.00)
INSERT INTO Coverage_Rates VALUES (2, '2004-01-01', '9999-12-31', 90.00)
INSERT INTO Coverage_Rates VALUES (2, '2004-03-01', '2004-08-01', 20.00)
INSERT INTO Coverage_Rates VALUES (2, '2004-08-01', '2004-08-01', 30.00)
GO
CREATE TABLE #max_rates(ident int IDENTITY,
coverage_id int NOT NULL,
date datetime NOT NULL,
rate money NOT NULL,
PRIMARY KEY CLUSTERED (ident),
UNIQUE NONCLUSTERED (coverage_id, date))
go
-- First get max rate on all days where this a change, and the days
-- before and after the changes. Note that we must include the day
-- before a period as a day with a zero rate, in case there were no
-- rates at all this day.
INSERT #max_rates (coverage_id, date, rate)
SELECT C.coverage_id, C.date, coalesce(MAX(R.rate), 0)
FROM Coverage_Rates R
RIGHT JOIN
(SELECT coverage_id, date = start_date
FROM Coverage_Rates
UNION
SELECT coverage_id, date = dateadd(DAY, -1, start_date)
FROM Coverage_Rates
UNION
SELECT coverage_id, end_date
FROM Coverage_Rates
UNION
SELECT coverage_id, dateadd(DAY, +1, end_date)
FROM Coverage_Rates
WHERE end_date < '99991231') C
ON C.coverage_id = R.coverage_id
AND C.date BETWEEN R.start_date AND R.end_date
GROUP BY C.coverage_id, C.date
ORDER BY C.coverage_id, C.date
go
SELECT * FROM #max_rates ORDER BY ident
go
-- Delete adjacent rows with the same rate.
DELETE #max_rates
FROM #max_rates a
WHERE EXISTS (SELECT *
FROM #max_rates b
WHERE a.coverage_id = b.coverage_id
AND a.ident - 1 = b.ident
AND a.rate = b.rate)
go
SELECT * FROM #max_rates ORDER BY ident
go
-- And then transform #max_rates back to the Coverage_rates format.
SELECT a.coverage_id, start_date = a.date,
end_date = coalesce(dateadd(DAY, -1, b.date), '99991231'),
a.rate
FROM #max_rates a
LEFT JOIN #max_rates b
ON a.coverage_id = b.coverage_id
AND b.date = (SELECT MIN(c.date)
FROM #max_rates c
WHERE a.coverage_id = c.coverage_id
AND c.date > a.date)
WHERE a.rate > 0
ORDER BY a.coverage_id, a.date
go
DROP TABLE Coverage_Rates, #max_rates
------------------------------------------------------------------------
go
CREATE TABLE Coverage_Rates (
rate_id INT IDENTITY NOT NULL,
coverage_id INT NOT NULL,
start_date DATETIME NOT NULL,
end_date DATETIME NOT NULL,
rate MONEY NOT NULL )
GO
INSERT INTO Coverage_Rates VALUES (1, '2004-01-01', '2004-06-01', 40.00)
INSERT INTO Coverage_Rates VALUES (1, '2004-03-01', '2004-08-01', 20.00)
INSERT INTO Coverage_Rates VALUES (1, '2004-06-01', '2004-08-01', 30.00)
INSERT INTO Coverage_Rates VALUES (1, '2004-09-01', '2004-10-01', 25.00)
INSERT INTO Coverage_Rates VALUES (2, '2004-01-01', '9999-12-31', 90.00)
INSERT INTO Coverage_Rates VALUES (2, '2004-03-01', '2004-08-01', 20.00)
INSERT INTO Coverage_Rates VALUES (2, '2004-08-01', '2004-08-01', 30.00)
GO
CREATE TABLE #max_rates(ident int IDENTITY,
coverage_id int NOT NULL,
date datetime NOT NULL,
rate money NOT NULL,
PRIMARY KEY CLUSTERED (ident),
UNIQUE NONCLUSTERED (coverage_id, date))
go
INSERT #max_rates (coverage_id, date, rate)
SELECT R.coverage_id, C.date, MAX(R.rate)
FROM (SELECT coverage_id, start_date, end_date, rate
FROM Coverage_Rates
UNION
SELECT coverage_id, '19000101', '99991231', 0
FROM Coverage_Rates) AS R
JOIN (SELECT date = start_date
FROM Coverage_Rates
UNION
SELECT date = dateadd(DAY, -1, start_date)
FROM Coverage_Rates
UNION
SELECT end_date
FROM Coverage_Rates
UNION
SELECT dateadd(DAY, +1, end_date)
FROM Coverage_Rates
WHERE end_date < '99991231') C
ON C.date BETWEEN R.start_date AND R.end_date
GROUP BY R.coverage_id, C.date
ORDER BY R.coverage_id, C.date
go
SELECT * FROM #max_rates ORDER BY ident
go
DELETE #max_rates
FROM #max_rates a
WHERE EXISTS (SELECT *
FROM #max_rates b
WHERE a.coverage_id = b.coverage_id
AND a.ident - 1 = b.ident
AND a.rate = b.rate)
go
SELECT * FROM #max_rates ORDER BY ident
go
SELECT a.coverage_id, start_date = a.date,
end_date = coalesce(dateadd(DAY, -1,
(SELECT MIN(b.date) FROM #max_rates b
WHERE a.coverage_id = b.coverage_id
AND b.date > a.date)), '99991231'),
a.rate
FROM #max_rates a
WHERE a.rate > 0
ORDER BY a.coverage_id, a.date
go
DROP TABLE Coverage_Rates, #max_rates
--
Erland Sommarskog, SQL Server MVP,
es****@sommarskog.se
Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinf...2000/books.asp