I have a whoel chapter on verious ways to do this in SQL FOR SMARTIES.

Here is one answer.

Median with Characteristic Function

Anatoly Abramovich, Yelena Alexandrova, and Eugene Birger presented a

series of articles in SQL Forum magazine on computing the median (SQL

Forum 1993, 1994). They define a characteristic function, which they

call delta, using the Sybase sign() function. The delta or

characteristic function accepts a Boolean expression as an argument and

returns a 1 if it is TRUE and a zero if it is FALSE or UNKNOWN.

In SQL-92 we have a CASE expression, which can be used to construct the

delta function. This is new to SQL-92, but you can find vendor

functions of the form IF...THEN...ELSE that behave like the condition

expression in Algol or like the question markPcolon operator in C.

The authors also distinguish between the statistical median, whose

value must be a member of the set, and the financial median, whose

value is the average of the middle two members of the set. A

statistical median exists when there is an odd number of items in the

set. If there is an even number of items, you must decide if you want

to use the highest value in the lower half (they call this the left

median) or the lowest value in the upper half (they call this the right

median).

The left statistical median of a unique column can be found with this

query:

SELECT P1.bin

FROM Parts AS P1, Parts AS P2

GROUP BY P1.bin

HAVING SUM(CASE WHEN (P2.bin <= P1.bin) THEN 1 ELSE 0 END)

= (COUNT(*) + 1) / 2;

Changing the direction of the theta test in the HAVING clause will

allow you to pick the right statistical median if a central element

does not exist in the set. You will also notice something else about

the median of a set of unique values: It is usually meaningless. What

does the median bin number mean, anyway? A good rule of thumb is that

if it does not make sense as an average, it does not make sense as a

median.

The statistical median of a column with duplicate values can be found

with a query based on the same ideas, but you have to adjust the HAVING

clause to allow for overlap; thus, the left statistical median is found

by

SELECT P1.weight

FROM Parts AS P1, Parts AS P2

GROUP BY P1.weight

HAVING SUM(CASE WHEN P2.weight <= P1.weight

THEN 1 ELSE 0 END)

= ((COUNT(*) + 1) / 2)
AND SUM(CASE WHEN P2.weight >= P1.weight

THEN 1 ELSE 0 END)= (COUNT(*)/2 + 1);

Notice that here the left and right medians can be the same, so there

is no need to pick one over the other in many of the situations where

you have an even number of items. Switching the comparison operators in

the two CASE expressions will give you the right statistical median.

The author's query for the financial median depends on some Sybase

features that cannot be found in other products, so I would recommend

using a combination of the right and left statistical medians to return

a set of values about the center of the data, and then averaging them,

thus:

SELECT AVG(P1.weight)

FROM Parts AS P1, Parts AS P2

HAVING (SUM(CASE WHEN P2.weight <= P1.weight -- left median

THEN 1 ELSE 0 END)= ((COUNT(*) + 1) / 2)
AND SUM(CASE WHEN P2.weight >= P1.weight

THEN 1 ELSE 0 END)= (COUNT(*)/2 + 1))
OR (SUM(CASE WHEN P2.weight >= P1.weight -- right median

THEN 1 ELSE 0 END)= ((COUNT(*) + 1) / 2)
AND SUM(CASE WHEN P2.weight <= P1.weight

THEN 1 ELSE 0 END)= (COUNT(*)/2 + 1));

An optimizer may be able to reduce this expression internally, since

the expressions involved with COUNT(*) are constants. This entire query

could be put into a FROM clause and the average taken of the one or two

rows in the result to find the financial median. In SQL-89, you would

have to define this as a VIEW and then take the average.

If you have SQL-2005, you can try something like (untested):

SELECT AVG(x),

ROW_NUMBER () OVER (ORDER BY x ASC) AS hi,

ROW_NUMBER () OVER (ORDER BY x DESC) AS lo,

FROM Foobar

WHERE hi IN (lo, lo-1, lo+1);