Ma**********@mailbox.tu-dresden.de wrote:
Hello,
DB2 seems to do a SORT before every GROUP BY action. At least if I look
at the query plan (using Visual Explain in the Command Center) of
-- connected to the sample db
SELECT WORKDEPT, SUM(SALARY)
FROM EMPLOYEE
GROUP BY WORKDEPT
a SORT is performed before GRPBY. I can imagine that it's necessary,
but why?
The rows in your above example could be read in any arbitrary order
(internally). So calculating the SUM for each group that way would mean
that DB2 has to calculate "n" sums in parallel where "n" is the number of
groups (= #workdept). If you do a sort first (by workdept), you can
pipeline things: calculate sum for first group, return result, then
calculate sum for 2nd group, return result, sum for 3rd group, return
result, ... One advantage is that memory consumption for the summation is
known up front and fixed.
Note that you _must not_ rely on GROUP BY doing a SORT internally. A sort
is surely just one option to implement a grouping. For example, if the
DBMS knows up front that you have only 2 workdepts, it may very well pick
the parallel approach. In other words, the _only_ thing that guarantees a
specific order of rows in a result set is an ORDER BY clause. Period.
--
Knut Stolze
DB2 Information Integration Development
IBM Germany