
Mother Celko's Monday SQL Puzzle #3
I am gathering material for a second edition of SQL PUZZLES & ANSWERS.
The easiest way to do this is to post a puzzle and harvest answers.
The solvers get fame (15 seconds, not minutes), glory and their name in
the book. The first edition did a lot of SQL86 code; I would like to
have answers which use more of the SQL92, SQL99 syntax.
===========
Luke Tymowski, a Canadian programmer, posted an interesting problem on
the MSACCESS forum on CompuServe in 1994 November. He was working on a
pension fund problem. In SQL92, the table involved would look like
this:
CREATE TABLE Pensions
(sin CHAR(10) NOT NULL,
pen_year INTEGER NOT NULL,
month_cnt INTEGER DEFAULT 0 NOT NULL
CHECK (month_cnt BETWEEN 0 AND 12),
earnings DECIMAL (8,2) DEFAULT 0.00 NOT NULL);
The SIN column is the Social Insurance Number, which is something like
the SSN Social Security Number used in the United States to identify
taxpayers. The pen_year column is the calendar year of the pension,
the month_cnt column is the number of months in that year the person
worked, and earnings is their total earnings for year.
The problem is to find the total earnings of each employee for the most
recent 60 months of month_cnt in consecutive years. This number is used
to compute their pension. The shortest period going back could be five
years with 12 months in each year applying to the total month_cnt. The
longest period could be 60 years with 1 month in each year. Some
people might work four years and not the fifth, and thus not qualify
for a pension at all.
The reason this is a beast to solve is that "most recent" and
"consecutive" are hard to write in SQL.
HINT: For each employee in each year, insert a row even in the years
they did not work. It not only makes the query easier, but you also
have a record to update when you get in new information.
HINT What about the SQL99 OLAP functions?  
Share:

This is an interesting problem because the answer should almost always
be indeterminate.
How can you determine the most recent 60 month's salary in the following
case?
Employee works for 10 full years starting in January of the first year
and six months in the final, 11th year.
The most recent 60 months starts in the middle of a year, in July. There
is no data available in the database to show the salary earned during
the first six months of the 60 month period. An average monthly salary
for that year could be used but that wouldn't properly account for a pay
raise that occurred in July of the first year used for the calculation.
This issue will occur any time the number of months needed to make 60 is
less than the number of months worked in the earliest year that is used
to build the 60 month period.
The problem is worse for hourly workers who work different numbers of
hours in different months.
If this is a real pension fund, I'm glad I'm not part of it. The
originator's database, MSACCESS, may be an indication of why the design
doesn't provide the data to correctly solve the problem.
Phil Sherman
CELKO wrote: Mother Celko's Monday SQL Puzzle #3
I am gathering material for a second edition of SQL PUZZLES & ANSWERS. The easiest way to do this is to post a puzzle and harvest answers. The solvers get fame (15 seconds, not minutes), glory and their name in the book. The first edition did a lot of SQL86 code; I would like to have answers which use more of the SQL92, SQL99 syntax.
===========
Luke Tymowski, a Canadian programmer, posted an interesting problem on the MSACCESS forum on CompuServe in 1994 November. He was working on a pension fund problem. In SQL92, the table involved would look like this:
CREATE TABLE Pensions (sin CHAR(10) NOT NULL, pen_year INTEGER NOT NULL, month_cnt INTEGER DEFAULT 0 NOT NULL CHECK (month_cnt BETWEEN 0 AND 12), earnings DECIMAL (8,2) DEFAULT 0.00 NOT NULL);
The SIN column is the Social Insurance Number, which is something like the SSN Social Security Number used in the United States to identify taxpayers. The pen_year column is the calendar year of the pension, the month_cnt column is the number of months in that year the person worked, and earnings is their total earnings for year.
The problem is to find the total earnings of each employee for the most recent 60 months of month_cnt in consecutive years. This number is used to compute their pension. The shortest period going back could be five years with 12 months in each year applying to the total month_cnt. The longest period could be 60 years with 1 month in each year. Some people might work four years and not the fifth, and thus not qualify for a pension at all.
The reason this is a beast to solve is that "most recent" and "consecutive" are hard to write in SQL.
HINT: For each employee in each year, insert a row even in the years they did not work. It not only makes the query easier, but you also have a record to update when you get in new information.
HINT What about the SQL99 OLAP functions?   
>> If this is a real pension fund, I'm glad I'm not part of it. <<
Maybe this is one of the reasons Canada has a National Debt greater
than its GDP :)?   
Assuming my solution is correct, getting the 60 months in consecutive
years was pretty straightforward. However, I couldn't avoid using a
subquery to find the most recent of those 60 month blocks. The subquery
itself includes the results you're looking for, only it doesn't reduce
it to the most recent block of years. I also added a column to my
result set to adjust the first year's earnings for the percentage that
could apply to the pension (i.e. person works 6 years, for a total of
71 months, subtract the the first year's earnings * (11 / 12) from the
total_earnings). Here is my solution:
select *
, total_month_cnt % 12 nonutilized_first_year_month_cnt
, (total_month_cnt % 12) / (first_year_month_cnt * 1.0) *
first_year_earnings
first_year_adjustment
, total_earnings 
(total_month_cnt % 12) / (first_year_month_cnt * 1.0) *
first_year_earnings
adjusted_total_earnings
from (
select p1.sin
, p1.pen_year first_year
, p2.pen_year last_year
, p1.month_cnt first_year_month_cnt
, p1.earnings first_year_earnings
, count(p3.pen_year) year_cnt
, sum(p3.month_cnt) total_month_cnt
, sum(p3.earnings) total_earnings
from pensions p1
inner join pensions p2
on p1.sin = p2.sin
inner join pensions p3
on p1.sin = p3.sin
where p3.pen_year between p1.pen_year and p2.pen_year
and p3.month_cnt > 0
group by p1.sin
, p1.pen_year
, p2.pen_year
, p1.month_cnt
, p1.earnings
having count(p3.pen_year) = p2.pen_year  p1.pen_year + 1
and sum(p3.month_cnt) between 60 and 60 + p1.month_cnt  1
) a
where a.last_year =
(
select max(last_year)
from (
select p2.pen_year last_year
from pensions p1
inner join pensions p2
on p1.sin = p2.sin
inner join pensions p3
on p1.sin = p3.sin
where p3.pen_year between p1.pen_year and p2.pen_year
and p3.month_cnt > 0
and p1.sin = a.sin
group by p1.sin
, p1.pen_year
, p2.pen_year
, p1.month_cnt
having count(p3.pen_year) = p2.pen_year  p1.pen_year + 1
and sum(p3.month_cnt) between 60 and 60 + p1.month_cnt  1
) b
)
Alan
CELKO wrote: Mother Celko's Monday SQL Puzzle #3
I am gathering material for a second edition of SQL PUZZLES & ANSWERS. The easiest way to do this is to post a puzzle and harvest answers. The solvers get fame (15 seconds, not minutes), glory and their name in the book. The first edition did a lot of SQL86 code; I would like to have answers which use more of the SQL92, SQL99 syntax.
===========
Luke Tymowski, a Canadian programmer, posted an interesting problem on the MSACCESS forum on CompuServe in 1994 November. He was working on a pension fund problem. In SQL92, the table involved would look like this:
CREATE TABLE Pensions (sin CHAR(10) NOT NULL, pen_year INTEGER NOT NULL, month_cnt INTEGER DEFAULT 0 NOT NULL CHECK (month_cnt BETWEEN 0 AND 12), earnings DECIMAL (8,2) DEFAULT 0.00 NOT NULL);
The SIN column is the Social Insurance Number, which is something like the SSN Social Security Number used in the United States to identify taxpayers. The pen_year column is the calendar year of the pension, the month_cnt column is the number of months in that year the person worked, and earnings is their total earnings for year.
The problem is to find the total earnings of each employee for the most recent 60 months of month_cnt in consecutive years. This number is used to compute their pension. The shortest period going back could be five years with 12 months in each year applying to the total month_cnt. The longest period could be 60 years with 1 month in each year. Some people might work four years and not the fifth, and thus not qualify for a pension at all.
The reason this is a beast to solve is that "most recent" and "consecutive" are hard to write in SQL.
HINT: For each employee in each year, insert a row even in the years they did not work. It not only makes the query easier, but you also have a record to update when you get in new information.
HINT What about the SQL99 OLAP functions?   
.... or, using common table expressions:
with a as (select row_number() over (order by p1.sin, p2.pen_year)
row_number
, p1.sin
, p1.pen_year first_year
, p2.pen_year last_year
, p1.month_cnt first_year_month_cnt
, p1.earnings first_year_earnings
, count(p3.pen_year) year_cnt
, sum(p3.month_cnt) total_month_cnt
, sum(p3.earnings) total_earnings
from pensions p1
inner join pensions p2
on p1.sin = p2.sin
inner join pensions p3
on p1.sin = p3.sin
where p3.pen_year between p1.pen_year and p2.pen_year
and p3.month_cnt > 0
group by p1.sin
, p1.pen_year
, p2.pen_year
, p1.month_cnt
, p1.earnings
having count(p3.pen_year) = p2.pen_year  p1.pen_year + 1
and sum(p3.month_cnt) between 60 and 60 + p1.month_cnt  1
)
select sin
, total_earnings
from a parent
where not exists (select * from a where sin = parent.sin and
row_number > parent.row_number)
Alan Samet wrote: Assuming my solution is correct, getting the 60 months in consecutive years was pretty straightforward. However, I couldn't avoid using a subquery to find the most recent of those 60 month blocks. The subquery itself includes the results you're looking for, only it doesn't reduce it to the most recent block of years. I also added a column to my result set to adjust the first year's earnings for the percentage that could apply to the pension (i.e. person works 6 years, for a total of 71 months, subtract the the first year's earnings * (11 / 12) from the total_earnings). Here is my solution:
select * , total_month_cnt % 12 nonutilized_first_year_month_cnt , (total_month_cnt % 12) / (first_year_month_cnt * 1.0) * first_year_earnings first_year_adjustment , total_earnings  (total_month_cnt % 12) / (first_year_month_cnt * 1.0) * first_year_earnings adjusted_total_earnings from ( select p1.sin , p1.pen_year first_year , p2.pen_year last_year , p1.month_cnt first_year_month_cnt , p1.earnings first_year_earnings , count(p3.pen_year) year_cnt , sum(p3.month_cnt) total_month_cnt , sum(p3.earnings) total_earnings from pensions p1 inner join pensions p2 on p1.sin = p2.sin inner join pensions p3 on p1.sin = p3.sin where p3.pen_year between p1.pen_year and p2.pen_year and p3.month_cnt > 0 group by p1.sin , p1.pen_year , p2.pen_year , p1.month_cnt , p1.earnings having count(p3.pen_year) = p2.pen_year  p1.pen_year + 1 and sum(p3.month_cnt) between 60 and 60 + p1.month_cnt  1 ) a where a.last_year = ( select max(last_year) from ( select p2.pen_year last_year from pensions p1 inner join pensions p2 on p1.sin = p2.sin inner join pensions p3 on p1.sin = p3.sin where p3.pen_year between p1.pen_year and p2.pen_year and p3.month_cnt > 0 and p1.sin = a.sin group by p1.sin , p1.pen_year , p2.pen_year , p1.month_cnt having count(p3.pen_year) = p2.pen_year  p1.pen_year + 1 and sum(p3.month_cnt) between 60 and 60 + p1.month_cnt  1 ) b )
Alan
CELKO wrote: Mother Celko's Monday SQL Puzzle #3
I am gathering material for a second edition of SQL PUZZLES & ANSWERS. The easiest way to do this is to post a puzzle and harvest answers. The solvers get fame (15 seconds, not minutes), glory and their name in the book. The first edition did a lot of SQL86 code; I would like to have answers which use more of the SQL92, SQL99 syntax.
===========
Luke Tymowski, a Canadian programmer, posted an interesting problem on the MSACCESS forum on CompuServe in 1994 November. He was working on a pension fund problem. In SQL92, the table involved would look like this:
CREATE TABLE Pensions (sin CHAR(10) NOT NULL, pen_year INTEGER NOT NULL, month_cnt INTEGER DEFAULT 0 NOT NULL CHECK (month_cnt BETWEEN 0 AND 12), earnings DECIMAL (8,2) DEFAULT 0.00 NOT NULL);
The SIN column is the Social Insurance Number, which is something like the SSN Social Security Number used in the United States to identify taxpayers. The pen_year column is the calendar year of the pension, the month_cnt column is the number of months in that year the person worked, and earnings is their total earnings for year.
The problem is to find the total earnings of each employee for the most recent 60 months of month_cnt in consecutive years. This number is used to compute their pension. The shortest period going back could be five years with 12 months in each year applying to the total month_cnt. The longest period could be 60 years with 1 month in each year. Some people might work four years and not the fifth, and thus not qualify for a pension at all.
The reason this is a beast to solve is that "most recent" and "consecutive" are hard to write in SQL.
HINT: For each employee in each year, insert a row even in the years they did not work. It not only makes the query easier, but you also have a record to update when you get in new information.
HINT What about the SQL99 OLAP functions?   
Alan Samet wrote: ... or, using common table expressions:
with a as (select row_number() over (order by p1.sin, p2.pen_year) row_number
....[snip]... Alan
Damn! Beaten to it...
CELKO wrote: Mother Celko's Monday SQL Puzzle #3
....[snip]... HINT: For each employee in each year, insert a row even in the years they did not work. It not only makes the query easier, but you also have a record to update when you get in new information.
This hint is indeed helpful
HINT What about the SQL99 OLAP functions?
However, I'm not so sure this one is. I started out playing around with
the following expression:
SELECT
SIN,
PEN_YEAR,
MONTH_CNT,
SUM(MONTH_CNT) OVER (
PARTITION BY SIN
ORDER BY PEN_YEAR DESC
) AS MONTH_CUME,
EARNINGS,
SUM(EARNINGS) OVER (
PARTITION BY SIN
ORDER BY PEN_YEAR DESC
) AS EARNINGS_CUME
FROM PENSIONS
However, trying to find a way to limit the cumulative SUMs to range
over consecutive runs of records with MONTH_CNT > 0 seems to be
impossible (at least in DB2s implementation). Assume one adds a "RESET"
field which is 'Y' when MONTH_CNT = 0 and 'N' otherwise:
WITH P AS (
SELECT
SIN,
PEN_YEAR,
MONTH_CNT,
CASE MONTH_CNT WHEN 0 THEN 'Y' ELSE 'N' END AS MONTH_RESET,
EARNINGS
FROM PENSIONS
)
....
Unfortunately this doesn't help much: You can't partition on SIN,
MONTH_RESET as you'll just wind up with nonconsecutive runs on
PEN_YEAR. The ROWS and RANGE clauses of the OLAP functions won't help
either (we're not dealing with a fixed offset of PEN_YEAR).
If the aggregationwindow could be limited by a searchcondition
instead of fixed row or key offsets it would be easy, but it doesn't
look like that's possible. Hence, I don't think OLAP functions are the
answer here.
Eventually I hit on the same idea that Paul's used (joining the table
to itself a couple of times, one to form the start of the range,
another to form the end of the range, and a third to aggregate across
the range). Naturally I ran into the same problem as Paul mentions in
his post: that you wind up with several potential ranges of consecutive
years which have at least 60 months, and you only want the last one.
Instead of using the ROW_NUMBER OLAPfunction, I just used a second
subquery with MAX() to fix this:
WITH
RANGES AS (
SELECT
P1.SIN AS SIN,
P1.PEN_YEAR AS FIRST_YEAR,
P2.PEN_YEAR AS LAST_YEAR,
SUM(P3.EARNINGS) AS EARNINGS_SUM
FROM
PENSIONS P1
INNER JOIN PENSIONS P2
ON P1.SIN = P2.SIN
INNER JOIN PENSIONS P3
ON P1.SIN = P3.SIN
WHERE
P3.PEN_YEAR BETWEEN P1.PEN_YEAR AND P2.PEN_YEAR
AND P3.MONTH_CNT > 0
GROUP BY
P1.SIN,
P1.PEN_YEAR,
P2.PEN_YEAR,
P1.MONTH_CNT
HAVING
SUM(P3.MONTH_CNT) BETWEEN 60 AND 60 + P1.MONTH_CNT  1
AND COUNT(P3.PEN_YEAR) = P2.PEN_YEAR  P1.PEN_YEAR + 1
),
LAST_RANGE AS (
SELECT
SIN AS SIN,
MAX(LAST_YEAR) AS LAST_YEAR
FROM RANGES
GROUP BY SIN
)
SELECT R.*
FROM
RANGES R
INNER JOIN LAST_RANGE L
ON R.SIN = L.SIN
AND R.LAST_YEAR = L.LAST_YEAR
Paul's solution is more complete given that it includes the fields that
would be required to normalize the result by excluding excess months
from the first year (I hadn't thought about that).
Incidentally, if anyone wants to play with this, here's a little Python
script I bashed together to generate some random data for the table
(alter the sin_count variable if you want data for more than 10 people):
#!/bin/env python
# vim: set noet sw=4 ts=4:
import random
from decimal import Decimal
def randInt(min, max):
return random.randint(min, max)
def randDecimal(prec, scale):
return Decimal(random.randint(0, 10 ** prec  1)) / (10 ** scale)
def randString(len, alphabet="ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"):
return ''.join([random.choice(alphabet) for i in xrange(len)])
def main():
sin_count = 10
print "INSERT INTO PENSIONS"
print "(SIN, PEN_YEAR, MONTH_CNT, EARNINGS)"
print "VALUES"
first = True
for sin in set([randString(10) for i in xrange(sin_count)]):
for pen_year in range(2006randInt(0, 60), 2006):
month_cnt = randInt(0, 12)
earnings = randDecimal(6, 2) * month_cnt
prefix = [",", " "][first]
first = False
print "%s('%s', %4d, %2d, %9.2f)" % (
prefix, sin, pen_year, month_cnt, earnings)
print ";"
if __name__ == "__main__":
main()
Enjoy!
Dave.
   
Dave Hughes wrote: Alan Samet wrote:
... or, using common table expressions:
with a as (select row_number() over (order by p1.sin, p2.pen_year) row_number ...[snip]... Alan
Damn! Beaten to it...
Paul's solution is more complete given that it includes the fields
Sorry ... *Alan's* solution (brain trying to deal with multiple threads
and failing!)
Dave.
   
it seems you have already got a solution
(
DBMS, January 1998 http://www.dbmsmag.com/9801d06.html
)
WITH P(sin, pen_year, earnings) AS(
SELECT P1.sin, P1.pen_year, P1.earnings
FROM Pensions AS P1
INNER JOIN Pensions AS P2
ON P1.sin = P2.sin
AND P1.pen_year <= P2.pen_year
GROUP BY P1.sin, P1.pen_year, P1.month_cnt, P1.earnings
HAVING SUM(P2.month_cnt)  P1.month_cnt < 60)
SELECT sin, MIN(pen_year) AS set_year,
MAX(pen_year) AS end_year,
SUM(earnings) AS total_earnings
FROM P
GROUP BY sin;

Andrey Odegov av******@yandex.ru
(remove GOV to respond)   
it seems the thing is more complicated
WITH P4(sin, range, set_year, end_year, month_tally, total_earnings)
AS(
SELECT sin, range, MIN(pen_year), MAX(pen_year), SUM(month_cnt),
SUM(earnings)
FROM (SELECT P1.sin, SUM(CASE WHEN P2.sin IS NULL THEN 0 ELSE 1 END),
P1.pen_year, P1.month_cnt, P1.earnings
FROM Pensions AS P1
LEFT OUTER JOIN Pensions AS P2
ON P2.sin = P1.sin
AND P2.pen_year < P1.pen_year AND P2.month_cnt = 0
WHERE P1.month_cnt <> 0
GROUP BY P1.sin, P1.pen_year, P1.month_cnt, P1.earnings
) AS P3(sin, range, pen_year, month_cnt, earnings)
GROUP BY sin, range
HAVING SUM(month_cnt) >= 60)
SELECT sin, set_year, end_year, month_tally, total_earnings
FROM P4
WHERE NOT EXISTS(SELECT *
FROM P4 AS P5
WHERE P5.sin = P4.sin
AND P5.range > P4.range);

Andrey Odegov av******@yandex.ru
(remove GOV to respond)   This discussion thread is closed Replies have been disabled for this discussion. Similar topics
18 posts
views
Thread by Guinness Mann 
last post: by

3 posts
views
Thread by Nick Chan 
last post: by

11 posts
views
Thread by CELKO 
last post: by
   
reply
views
Thread by DA Morgan 
last post: by

5 posts
views
Thread by Chris 
last post: by
           