>> Newbie on SQL and suffering through this. <<
Let me do a cut & paste on a trick you should consider using on this
kind of crappy denormalized data; sorting it within the row.
=========
Another trick is the Bose-Nelson sort ("A Sorting Problem" by R. C.
Bose and R. J. Nelson; Journal of the ACM, vol. 9 pages 282-296),
which I had written about in DR. DOBB'S JOURNAL back in 1985. This is
a recursive procedure that takes an integer and then generates swap
pairs for a vector of that size. A swap pair is a pair of position
numbers from 1 to (n) in the vector which need to be exchanged if they
are out of order. This swap pairs are also related to Sorting
Networks in the literature (see THE ART OF COMPUTER PROGRAMMING by
Donald Knuth, vol 3).
You are probably thinking that this method is a bit weak because the
results are only good for sorting a fixed number of items. But a
table only has a fixed number of columns, so that is not a problem in
denormalized SQL.
You can set up a sorting network that will sort five items, with the
minimal number of exchanges, nine swaps, like this:
swap (c1, c2);
swap (c4, c5);
swap (c3, c5);
swap (c3, c4);
swap (c1, c4);
swap (c1, c3);
swap (c2, c5);
swap (c2, c4);
swap (c2, c3);
You might want to deal yourself a hand of five playing cards in one
suit to see how it works. Put the cards face down on the table and
pick up the pairs, swapping them if required, then turn over the row
to see that it is in sorted order when you are done.
In theory, the minimum number of swaps needed to sort (n) items is
CEILING (log2 (n!)) and as (n) increases, this approaches
O(n*log2(n)). The Computer Science majors will remember that "Big O"
expression as the expected performance of the best sorting algorithms,
such as Quicksort. The Bose-Nelson method is very good for small
values of (n). If (n < 9) then it is perfect, actually. But as
things get bigger, Bose-Nelson approaches O(n ^ 1.585). In English,
this method is good for a fixed size list of 16 or fewer items and
goes to hell after that.
You can write a version of the Bose-Nelson procedure which will output
the SQL code for a given value of (n). The obvious direct way to do a
swap() is to write a chain of UPDATE statements. Remember that in
SQL, the SET clause assignments happen in parallel, so you can easily
write a SET clause that exchanges the two items when are out of order.
Using the above swap chain, we get this block of code:
BEGIN ATOMIC
-- swap (c1, c2);
UPDATE Foobar
SET c1 = c2, c2 = c1
WHERE c1 > c2;
-- swap (c4, c5);
UPDATE Foobar
SET c4 = c5, c5 = c4
WHERE c4 > c5;
-- swap (c3, c5);
UPDATE Foobar
SET c3 = c5, c5 = c3
WHERE c3 > c5;
-- swap (c3, c4);
UPDATE Foobar
SET c3 = c4, c4 = c3
WHERE c3 > c4;
-- swap (c1, c4);
UPDATE Foobar
SET c1 = c4, c4 = c1
WHERE c1 > c4;
-- swap (c1, c3);
UPDATE Foobar
SET c1 = c3, c3 = c1
WHERE c1 > c3;
-- swap (c2, c5);
UPDATE Foobar
SET c2 = c5, c5 = c2
WHERE c2 > c5;
-- swap (c2, c4);
UPDATE Foobar
SET c2 = c4, c4 = c2
WHERE c2 > c4;
-- swap (c2, c3);
UPDATE Foobar
SET c2 = c3, c3 = c2
WHERE c2 > c3;
END;
This fully portable, standard SQL code and it can be machine
generated. But that parallelism is useful. It is worthwhile to
combine some of the UPDATE statements. But you have to be careful not
to change the effective sequence of the swap operations.
If you look at the first two UPDATE statements, you can see that they
do not overlap. This means you could roll them into one statement
like this:
-- swap (c1, c2) AND swap (c4, c5);
UPDATE Foobar
SET c1 = CASE WHEN c1 <= c2 THEN c1 ELSE c2 END,
c2 = CASE WHEN c1 <= c2 THEN c2 ELSE c1 END,
c4 = CASE WHEN c4 <= c5 THEN c4 ELSE c5 END,
c5 = CASE WHEN c4 <= c5 THEN c5 ELSE c4 END
WHERE c4 > c5 OR c1 > c2;
The advantage of doing this is that you have to execute only one
UPDATE statement and not two. Updating a table, even on non-key
columns, usually locks the table and prevents other users from getting
to the data. If you could roll the statements into one single UPDATE,
you would have the best of all possible worlds, but I doubt that the
code would be easy to read.
=================
I also cannot stand the useless table names, so let me change them as
if human beings reading the code mattered.
-- since it has no key, it is not a table!! Not 1NF either!
CREATE TABLE PeopleSkills
(name VARCHAR(10) NOT NULL,
code1 INTEGER NOT NULL,
code2 INTEGER NOT NULL,
code3 INTEGER NOT NULL,
code4 INTEGER NOT NULL,
code5 INTEGER NOT NULL);
DELETE FROM PeopleSkills;
INSERT INTO PeopleSkills VALUES ('mary', 1, 7, 8, 9, 13);
INSERT INTO PeopleSkills VALUES ('mary', 1, 7, 8, 9, 13);
INSERT INTO PeopleSkills VALUES ('mary', 1, 7, 7, 7, 13);
INSERT INTO PeopleSkills VALUES ('mary', 1, 7, 8, 9, 13);
INSERT INTO PeopleSkills VALUES ('joe', 1, 7, 8, 9, 3);
INSERT INTO PeopleSkills VALUES ('bob', 1, 7, 8, 9, 3);
INSERT INTO PeopleSkills VALUES ('larry', 22, 17, 18, 19, 113); -- 0
target codes
INSERT INTO PeopleSkills VALUES ('mary', 1, 3, 2, 9, 13);
INSERT INTO PeopleSkills VALUES ('melvin', 1, 3, 2, 9, 13); -- 2
target codes
INSERT INTO PeopleSkills VALUES ('irving', 1, 8, 2, 9, 13); -- 1
target codes
CREATE TABLE TargetCodes
(code INTEGER NOT NULL PRIMARY KEY,
descript VARCHAR(50) NOT NULL);
INSERT INTO TargetCodes VALUES (1, 'code1')
INSERT INTO TargetCodes VALUES (3, 'code3')
INSERT INTO TargetCodes VALUES (7, 'code7')
I need to produce report:
name code count (of names that contain at least 1 code in Table2)<<
This will give you just the names:
SELECT DISTINCT name
FROM PeopleSkills AS P1
WHERE code1 IN (SELECT code FROM TargetCodes)
OR code2 IN (SELECT code FROM TargetCodes)
OR code3 IN (SELECT code FROM TargetCodes)
OR code4 IN (SELECT code FROM TargetCodes)
OR code5 IN (SELECT code FROM TargetCodes);
This modification will shown you which codes each person has, with 1/0
flags. This has aneat trick with little-used SUM(DISTINCT)
construction, but you have to know what the target codes are in
advance.
SELECT name,
SUM(DISTINCT CASE WHEN 1 IN (code1, code2, code3,code4,code5)
THEN 1 ELSE 0 END) AS code1,
SUM(DISTINCT CASE WHEN 3 IN (code1, code2, code3,code4,code5)
THEN 1 ELSE 0 END) AS code3,
SUM(DISTINCT CASE WHEN 7 IN (code1, code2, code3,code4,code5)
THEN 1 ELSE 0 END) AS code7
FROM PeopleSkills AS P1
WHERE code1 IN (SELECT code FROM TargetCodes)
OR code2 IN (SELECT code FROM TargetCodes)
OR code3 IN (SELECT code FROM TargetCodes)
OR code4 IN (SELECT code FROM TargetCodes)
OR code5 IN (SELECT code FROM TargetCodes)
GROUP BY name;
But I think you wanted a count of how many of the code appear in each
name, which is pretty easy:
SELECT P1.name, COUNT(DISTINCT T1.code)
FROM PeopleSkills AS P1,
TargetCodes AS T1
WHERE T1.code IN (code1, code2, code3, code4, code5)
GROUP BY name;
I would do it this way, to get the zeroes:
SELECT P1.name, COUNT(DISTINCT T1.code)
FROM PeopleSkills AS P1
LEFT OUTER JOIN
TargetCodes AS T1
ON T1.code IN (code1, code2, code3, code4, code5)
GROUP BY name;