"Indraneel Sheorey" <in***************@dartmouth.edu> wrote in message
news:35*************************@posting.google.co m...
Hello,
I want to set up a query in SQL Server that is "grouped by" a variable
PRICE. Since PRICE takes on continuous decimal values, I want to
create deciles based on this variable and then display the average
price in each decile. As background, PRICE is a calculated quantity:
I divide a table field SLS_DLRS by a field SLS_UNTS to get it. I also
want to include an average SLS_UNTS for each decile.
So essentially, I want the result to be something like:
DECILE AVG_PRICE AVG_SLS_UNTS
1 0.50 5.2
2 1.50 4.7
... ... ...
10 9.50 1.1
Is there a way to do this in an SQL statement?
Thanks in advance,
Indraneel
There are different methods, with varying degrees of accuracy, to
calculate percentiles. Here's one reasonable method described through
an example. Consider the 50th percentile (5th decile or median) for the
following numbers, arranged in increasing order:
3, 5, 7, 8, 9, 11, 13, 15
1. Compute the rank R of the 50th percentile.
R = P / 100 * (N + 1)
P = the desired percentile
N = number of numbers in the collection
R = 50 / 100 * (8 + 1) = 9 / 2 = 4.5
2. When R is an integer, the Pth percentile would be the number with
rank R.
3. When R is not an integer, as in this case, the Pth percentile is
computed by interpolation as follows:
a. Define IR as the integer portion of R. For this example, IR is 4.
b. Define FR as the fractional portion of R. For this example, FR is .5.
c. Find the values in the sequence with rank IR and IR + 1. For example,
this means the value with rank 4 and the value with rank 5, which
are 8 and 9, respectively.
d. Interpolate by multiplying the difference between the scores by FR
and adding the result to the lower score. For this example, this is
.5 * (9 - 8) + 8 = 8.5
Therefore, the 50th percentile is 8.5.
CREATE TABLE T
(
sls_dlrs DECIMAL (8, 2) NOT NULL CHECK (sls_dlrs > 0),
sls_unts INT NOT NULL CHECK (sls_unts > 0)
)
CREATE VIEW TPrices (sls_dlrs, sls_unts, price)
AS
SELECT sls_dlrs, sls_unts, sls_dlrs / sls_unts
FROM T
-- Rank prices in increasing order. For a given price P in an ordered collection,
-- give both the rank of the first occurrence of P and the rank of the last
-- occurrence of P. The first price P in an ordered collection has rank 1.
CREATE VIEW IncreasingOrderRanks (price, rank_first, rank_last)
AS
SELECT P1.price,
COUNT(*) - P1.occurrences + 1,
COUNT(*)
FROM (SELECT price, COUNT(*) AS occurrences
FROM TPrices
GROUP BY price) AS P1
INNER JOIN
TPrices AS P2
ON P2.price <= P1.price
GROUP BY P1.price, P1.occurrences
CREATE TABLE Digits
(
d INT NOT NULL PRIMARY KEY CHECK (d BETWEEN 0 AND 9)
)
INSERT INTO Digits (d)
VALUES (0)
INSERT INTO Digits (d)
VALUES (1)
INSERT INTO Digits (d)
VALUES (2)
INSERT INTO Digits (d)
VALUES (3)
INSERT INTO Digits (d)
VALUES (4)
INSERT INTO Digits (d)
VALUES (5)
INSERT INTO Digits (d)
VALUES (6)
INSERT INTO Digits (d)
VALUES (7)
INSERT INTO Digits (d)
VALUES (8)
INSERT INTO Digits (d)
VALUES (9)
-- Need to generate natural numbers up to 100 to represent whole number
-- percentile ranks
CREATE VIEW PercentileRanks (rank)
AS
SELECT Ones.d + 10 * Tens.d + 1
FROM Digits AS Ones
CROSS JOIN
Digits AS Tens
CREATE VIEW Percentiles (percentile, rank)
AS
SELECT CASE WHEN PR.rank * (RN.rank + 1) / 100.0 < R1.rank
THEN R1.price
WHEN PR.rank * (RN.rank + 1) / 100.0 > RN.rank
THEN RN.price
WHEN PR.rank * (RN.rank + 1) / 100.0 >
CAST(PR.rank * (RN.rank + 1) / 100.0 AS INT)
THEN
(SELECT CASE WHEN R1.price = R2.price
THEN R1.price
ELSE (PR.rank * (RN.rank + 1) / 100.0 -
CAST(PR.rank * (RN.rank + 1) / 100.0 AS INT)) *
(R2.price - R1.price) + R1.price
END
FROM IncreasingOrderRanks AS R1
INNER JOIN
IncreasingOrderRanks AS R2
ON CAST(PR.rank * (RN.rank + 1) / 100.0 AS INT)
BETWEEN R1.rank_first AND R1.rank_last AND
CAST(PR.rank * (RN.rank + 1) / 100.0 AS INT) + 1
BETWEEN R2.rank_first AND R2.rank_last)
ELSE (SELECT price
FROM IncreasingOrderRanks
WHERE CAST(PR.rank * (RN.rank + 1) / 100.0 AS INT)
BETWEEN rank_first AND rank_last)
END,
PR.rank
FROM PercentileRanks AS PR
CROSS JOIN
(SELECT 1, price
FROM IncreasingOrderRanks
WHERE rank_first = 1) AS R1(rank, price)
CROSS JOIN
(SELECT MAX(rank_last), MAX(price)
FROM IncreasingOrderRanks) AS RN(rank, price)
CREATE VIEW Deciles (decile, rank)
AS
SELECT percentile, rank / 10
FROM Percentiles
WHERE rank % 10 = 0
-- The desired query
SELECT D1.rank, AVG(P.price) AS avg_price, AVG(P.sls_unts) AS avg_sls_unts
FROM Deciles AS D1
LEFT OUTER JOIN
Deciles AS D2
ON D2.rank = D1.rank - 1
INNER JOIN
TPrices AS P
ON P.price <= D1.decile AND
(D2.decile IS NULL OR P.price > D2.decile)
GROUP BY D1.rank
ORDER BY D1.rank
Regards,
jag