By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
435,203 Members | 1,358 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 435,203 IT Pros & Developers. It's quick & easy.

somewhat painful join code

P: n/a
Hello.

Newbie on SQL and suffering through this.

I have two tables created as such:

drop table table1;
go
drop table table2;
go
create table table1(
name varchar(10),
code1 integer,
code2 integer,
code3 integer,
code4 integer,
code5 integer
);
go
create table table2(
code integer,
descript varchar(50)
);
go
INSERT INTO table1 VALUES ('mary',1,7,8,9,13)
INSERT INTO table1 VALUES ('mary',1,7,8,9,13)
INSERT INTO table1 VALUES ('mary',1,7,7,7,13)
INSERT INTO table1 VALUES ('mary',1,7,8,9,13)
INSERT INTO table1 VALUES ('joe',1,7,8,9,3)
INSERT INTO table1 VALUES ('bob',1,7,8,9,3)
INSERT INTO table1 VALUES ('larry',22,17,18,19,113)
INSERT INTO table1 VALUES ('mary',1,3,2,9,13)
INSERT INTO table2 VALUES (1,'code1')
INSERT INTO table2 VALUES (3,'code3')
INSERT INTO table2 VALUES (7,'code7')
go

Table1 will have duplicate name entries and code1 - code5 could
contain any range of numeric codes, including dups in the same row or
other rows.

Table2 sorted and indexed unique by code contains only the codes I'm
interested in reporting.

I need to produce report:

name code count (of names that contain at least 1 code in Table2)

I'll go back to suffering through it now, but I thought I'd post since
I'm not exactly sure where to begin.

Thank you for any help or information!

My above email is no longer active, so please post to the list so that
everyone can benefit and your contribution lives on.
Jul 20 '05 #1
Share this Question
Share on Google+
4 Replies


P: n/a
Here's the query:

SELECT T1.name, COUNT(*)
FROM Table2 AS T2
JOIN
(SELECT name, code1
FROM Table1
UNION ALL
SELECT name, code2
FROM Table1
UNION ALL
SELECT name, code3
FROM Table1
UNION ALL
SELECT name, code4
FROM Table1
UNION ALL
SELECT name, code5
FROM Table1) AS T1(name,code)
ON T1.code = T2.code
GROUP BY T1.name

But your tables need a complete redesign. I'm afraid there's so much wrong
here and without knowing more about the situation you are modelling it's
difficult to know where to start or even whether the above solution is
meaningful. If you're new to relational databases then read up on database
design and Normalization, or get whoever's responsible to fix the design at
the earliest opportunity.

Hope this helps.

--
David Portas
------------
Please reply only to the newsgroup
--
Jul 20 '05 #2

P: n/a
Hello David, thanks for response.

These two tables where created from flat files just to process them in
sql2000. I am new to SQL. It was not my intention to insure data
integrity or set constraints with the tables. Prior to your response,
I wrote the following and did a quick spot check on the output and it
appears to have produced what I wanted. I'm sure there's a good
explanation - but why and how is the result different from yours?

SELECT Table1.name, Table2.code, count(*) FROM Table1
INNER JOIN Table2 ON table1.code1=table2.code
or table1.code2 = table2.code or table1.code3=table2.code
or table1.code4=table2.code or table1.code5=table2.code
group by table1.name,table2.code
Thanks again!
Jul 20 '05 #3

P: n/a
My query counts the number of occurrences of each code. Your query counts
the number of rows on which a code appears. If the same code appears
multiple times on the same row then your query will only count it once
whereas mine will count it multiple times.

--
David Portas
------------
Please reply only to the newsgroup
--
Jul 20 '05 #4

P: n/a
>> Newbie on SQL and suffering through this. <<

Let me do a cut & paste on a trick you should consider using on this
kind of crappy denormalized data; sorting it within the row.

=========
Another trick is the Bose-Nelson sort ("A Sorting Problem" by R. C.
Bose and R. J. Nelson; Journal of the ACM, vol. 9 pages 282-296),
which I had written about in DR. DOBB'S JOURNAL back in 1985. This is
a recursive procedure that takes an integer and then generates swap
pairs for a vector of that size. A swap pair is a pair of position
numbers from 1 to (n) in the vector which need to be exchanged if they
are out of order. This swap pairs are also related to Sorting
Networks in the literature (see THE ART OF COMPUTER PROGRAMMING by
Donald Knuth, vol 3).

You are probably thinking that this method is a bit weak because the
results are only good for sorting a fixed number of items. But a
table only has a fixed number of columns, so that is not a problem in
denormalized SQL.

You can set up a sorting network that will sort five items, with the
minimal number of exchanges, nine swaps, like this:

swap (c1, c2);
swap (c4, c5);
swap (c3, c5);
swap (c3, c4);
swap (c1, c4);
swap (c1, c3);
swap (c2, c5);
swap (c2, c4);
swap (c2, c3);

You might want to deal yourself a hand of five playing cards in one
suit to see how it works. Put the cards face down on the table and
pick up the pairs, swapping them if required, then turn over the row
to see that it is in sorted order when you are done.

In theory, the minimum number of swaps needed to sort (n) items is
CEILING (log2 (n!)) and as (n) increases, this approaches
O(n*log2(n)). The Computer Science majors will remember that "Big O"
expression as the expected performance of the best sorting algorithms,
such as Quicksort. The Bose-Nelson method is very good for small
values of (n). If (n < 9) then it is perfect, actually. But as
things get bigger, Bose-Nelson approaches O(n ^ 1.585). In English,
this method is good for a fixed size list of 16 or fewer items and
goes to hell after that.

You can write a version of the Bose-Nelson procedure which will output
the SQL code for a given value of (n). The obvious direct way to do a
swap() is to write a chain of UPDATE statements. Remember that in
SQL, the SET clause assignments happen in parallel, so you can easily
write a SET clause that exchanges the two items when are out of order.
Using the above swap chain, we get this block of code:

BEGIN ATOMIC
-- swap (c1, c2);
UPDATE Foobar
SET c1 = c2, c2 = c1
WHERE c1 > c2;

-- swap (c4, c5);
UPDATE Foobar
SET c4 = c5, c5 = c4
WHERE c4 > c5;

-- swap (c3, c5);
UPDATE Foobar
SET c3 = c5, c5 = c3
WHERE c3 > c5;

-- swap (c3, c4);
UPDATE Foobar
SET c3 = c4, c4 = c3
WHERE c3 > c4;

-- swap (c1, c4);
UPDATE Foobar
SET c1 = c4, c4 = c1
WHERE c1 > c4;

-- swap (c1, c3);
UPDATE Foobar
SET c1 = c3, c3 = c1
WHERE c1 > c3;

-- swap (c2, c5);
UPDATE Foobar
SET c2 = c5, c5 = c2
WHERE c2 > c5;

-- swap (c2, c4);
UPDATE Foobar
SET c2 = c4, c4 = c2
WHERE c2 > c4;

-- swap (c2, c3);
UPDATE Foobar
SET c2 = c3, c3 = c2
WHERE c2 > c3;

END;

This fully portable, standard SQL code and it can be machine
generated. But that parallelism is useful. It is worthwhile to
combine some of the UPDATE statements. But you have to be careful not
to change the effective sequence of the swap operations.

If you look at the first two UPDATE statements, you can see that they
do not overlap. This means you could roll them into one statement
like this:

-- swap (c1, c2) AND swap (c4, c5);
UPDATE Foobar
SET c1 = CASE WHEN c1 <= c2 THEN c1 ELSE c2 END,
c2 = CASE WHEN c1 <= c2 THEN c2 ELSE c1 END,
c4 = CASE WHEN c4 <= c5 THEN c4 ELSE c5 END,
c5 = CASE WHEN c4 <= c5 THEN c5 ELSE c4 END
WHERE c4 > c5 OR c1 > c2;

The advantage of doing this is that you have to execute only one
UPDATE statement and not two. Updating a table, even on non-key
columns, usually locks the table and prevents other users from getting
to the data. If you could roll the statements into one single UPDATE,
you would have the best of all possible worlds, but I doubt that the
code would be easy to read.

=================
I also cannot stand the useless table names, so let me change them as
if human beings reading the code mattered.
-- since it has no key, it is not a table!! Not 1NF either!
CREATE TABLE PeopleSkills
(name VARCHAR(10) NOT NULL,
code1 INTEGER NOT NULL,
code2 INTEGER NOT NULL,
code3 INTEGER NOT NULL,
code4 INTEGER NOT NULL,
code5 INTEGER NOT NULL);

DELETE FROM PeopleSkills;
INSERT INTO PeopleSkills VALUES ('mary', 1, 7, 8, 9, 13);
INSERT INTO PeopleSkills VALUES ('mary', 1, 7, 8, 9, 13);
INSERT INTO PeopleSkills VALUES ('mary', 1, 7, 7, 7, 13);
INSERT INTO PeopleSkills VALUES ('mary', 1, 7, 8, 9, 13);
INSERT INTO PeopleSkills VALUES ('joe', 1, 7, 8, 9, 3);
INSERT INTO PeopleSkills VALUES ('bob', 1, 7, 8, 9, 3);
INSERT INTO PeopleSkills VALUES ('larry', 22, 17, 18, 19, 113); -- 0
target codes
INSERT INTO PeopleSkills VALUES ('mary', 1, 3, 2, 9, 13);
INSERT INTO PeopleSkills VALUES ('melvin', 1, 3, 2, 9, 13); -- 2
target codes
INSERT INTO PeopleSkills VALUES ('irving', 1, 8, 2, 9, 13); -- 1
target codes

CREATE TABLE TargetCodes
(code INTEGER NOT NULL PRIMARY KEY,
descript VARCHAR(50) NOT NULL);
INSERT INTO TargetCodes VALUES (1, 'code1')
INSERT INTO TargetCodes VALUES (3, 'code3')
INSERT INTO TargetCodes VALUES (7, 'code7')
I need to produce report:


name code count (of names that contain at least 1 code in Table2)<<

This will give you just the names:

SELECT DISTINCT name
FROM PeopleSkills AS P1
WHERE code1 IN (SELECT code FROM TargetCodes)
OR code2 IN (SELECT code FROM TargetCodes)
OR code3 IN (SELECT code FROM TargetCodes)
OR code4 IN (SELECT code FROM TargetCodes)
OR code5 IN (SELECT code FROM TargetCodes);

This modification will shown you which codes each person has, with 1/0
flags. This has aneat trick with little-used SUM(DISTINCT)
construction, but you have to know what the target codes are in
advance.

SELECT name,
SUM(DISTINCT CASE WHEN 1 IN (code1, code2, code3,code4,code5)
THEN 1 ELSE 0 END) AS code1,
SUM(DISTINCT CASE WHEN 3 IN (code1, code2, code3,code4,code5)
THEN 1 ELSE 0 END) AS code3,
SUM(DISTINCT CASE WHEN 7 IN (code1, code2, code3,code4,code5)
THEN 1 ELSE 0 END) AS code7
FROM PeopleSkills AS P1
WHERE code1 IN (SELECT code FROM TargetCodes)
OR code2 IN (SELECT code FROM TargetCodes)
OR code3 IN (SELECT code FROM TargetCodes)
OR code4 IN (SELECT code FROM TargetCodes)
OR code5 IN (SELECT code FROM TargetCodes)
GROUP BY name;

But I think you wanted a count of how many of the code appear in each
name, which is pretty easy:

SELECT P1.name, COUNT(DISTINCT T1.code)
FROM PeopleSkills AS P1,
TargetCodes AS T1
WHERE T1.code IN (code1, code2, code3, code4, code5)
GROUP BY name;

I would do it this way, to get the zeroes:
SELECT P1.name, COUNT(DISTINCT T1.code)
FROM PeopleSkills AS P1
LEFT OUTER JOIN
TargetCodes AS T1
ON T1.code IN (code1, code2, code3, code4, code5)
GROUP BY name;
Jul 20 '05 #5

This discussion thread is closed

Replies have been disabled for this discussion.