somewhat painful join code - Microsoft SQL Server

jason

Hello.

Newbie on SQL and suffering through this.

I have two tables created as such:

drop table table1;
go
drop table table2;
go
create table table1(
name varchar(10),
code1 integer,
code2 integer,
code3 integer,
code4 integer,
code5 integer
);
go
create table table2(
code integer,
descript varchar(50)
);
go
INSERT INTO table1 VALUES ('mary',1,7,8,9,13)
INSERT INTO table1 VALUES ('mary',1,7,8,9,13)
INSERT INTO table1 VALUES ('mary',1,7,7,7,13)
INSERT INTO table1 VALUES ('mary',1,7,8,9,13)
INSERT INTO table1 VALUES ('joe',1,7,8,9,3)
INSERT INTO table1 VALUES ('bob',1,7,8,9,3)
INSERT INTO table1 VALUES ('larry',22,17,18,19,113)
INSERT INTO table1 VALUES ('mary',1,3,2,9,13)
INSERT INTO table2 VALUES (1,'code1')
INSERT INTO table2 VALUES (3,'code3')
INSERT INTO table2 VALUES (7,'code7')
go

Table1 will have duplicate name entries and code1 - code5 could
contain any range of numeric codes, including dups in the same row or
other rows.

Table2 sorted and indexed unique by code contains only the codes I'm
interested in reporting.

I need to produce report:

name code count (of names that contain at least 1 code in Table2)

I'll go back to suffering through it now, but I thought I'd post since
I'm not exactly sure where to begin.

Thank you for any help or information!

My above email is no longer active, so please post to the list so that
everyone can benefit and your contribution lives on.

Jul 20 '05 #1

Subscribe Post Reply

2409

David Portas

Here's the query:

SELECT T1.name, COUNT(*)
FROM Table2 AS T2
JOIN
(SELECT name, code1
FROM Table1
UNION ALL
SELECT name, code2
FROM Table1
UNION ALL
SELECT name, code3
FROM Table1
UNION ALL
SELECT name, code4
FROM Table1
UNION ALL
SELECT name, code5
FROM Table1) AS T1(name,code)
ON T1.code = T2.code
GROUP BY T1.name

But your tables need a complete redesign. I'm afraid there's so much wrong
here and without knowing more about the situation you are modelling it's
difficult to know where to start or even whether the above solution is
meaningful. If you're new to relational databases then read up on database
design and Normalization, or get whoever's responsible to fix the design at
the earliest opportunity.

Hope this helps.

--
David Portas
------------
Please reply only to the newsgroup
--

Jul 20 '05 #2

jason

Hello David, thanks for response.

These two tables where created from flat files just to process them in
sql2000. I am new to SQL. It was not my intention to insure data
integrity or set constraints with the tables. Prior to your response,
I wrote the following and did a quick spot check on the output and it
appears to have produced what I wanted. I'm sure there's a good
explanation - but why and how is the result different from yours?

SELECT Table1.name, Table2.code, count(*) FROM Table1
INNER JOIN Table2 ON table1.code1=table2.code
or table1.code2 = table2.code or table1.code3=table2.code
or table1.code4=table2.code or table1.code5=table2.code
group by table1.name,table2.code
Thanks again!

Jul 20 '05 #3

David Portas

My query counts the number of occurrences of each code. Your query counts
the number of rows on which a code appears. If the same code appears
multiple times on the same row then your query will only count it once
whereas mine will count it multiple times.

--
David Portas
------------
Please reply only to the newsgroup
--

Jul 20 '05 #4

--CELKO--

>> Newbie on SQL and suffering through this. <<

Let me do a cut & paste on a trick you should consider using on this
kind of crappy denormalized data; sorting it within the row.

=========
Another trick is the Bose-Nelson sort ("A Sorting Problem" by R. C.
Bose and R. J. Nelson; Journal of the ACM, vol. 9 pages 282-296),
which I had written about in DR. DOBB'S JOURNAL back in 1985. This is
a recursive procedure that takes an integer and then generates swap
pairs for a vector of that size. A swap pair is a pair of position
numbers from 1 to (n) in the vector which need to be exchanged if they
are out of order. This swap pairs are also related to Sorting
Networks in the literature (see THE ART OF COMPUTER PROGRAMMING by
Donald Knuth, vol 3).

You are probably thinking that this method is a bit weak because the
results are only good for sorting a fixed number of items. But a
table only has a fixed number of columns, so that is not a problem in
denormalized SQL.

You can set up a sorting network that will sort five items, with the
minimal number of exchanges, nine swaps, like this:

swap (c1, c2);
swap (c4, c5);
swap (c3, c5);
swap (c3, c4);
swap (c1, c4);
swap (c1, c3);
swap (c2, c5);
swap (c2, c4);
swap (c2, c3);

You might want to deal yourself a hand of five playing cards in one
suit to see how it works. Put the cards face down on the table and
pick up the pairs, swapping them if required, then turn over the row
to see that it is in sorted order when you are done.

In theory, the minimum number of swaps needed to sort (n) items is
CEILING (log2 (n!)) and as (n) increases, this approaches
O(n*log2(n)). The Computer Science majors will remember that "Big O"
expression as the expected performance of the best sorting algorithms,
such as Quicksort. The Bose-Nelson method is very good for small
values of (n). If (n < 9) then it is perfect, actually. But as
things get bigger, Bose-Nelson approaches O(n ^ 1.585). In English,
this method is good for a fixed size list of 16 or fewer items and
goes to hell after that.

You can write a version of the Bose-Nelson procedure which will output
the SQL code for a given value of (n). The obvious direct way to do a
swap() is to write a chain of UPDATE statements. Remember that in
SQL, the SET clause assignments happen in parallel, so you can easily
write a SET clause that exchanges the two items when are out of order.
Using the above swap chain, we get this block of code:

BEGIN ATOMIC
-- swap (c1, c2);
UPDATE Foobar
SET c1 = c2, c2 = c1
WHERE c1 > c2;

-- swap (c4, c5);
UPDATE Foobar
SET c4 = c5, c5 = c4
WHERE c4 > c5;

-- swap (c3, c5);
UPDATE Foobar
SET c3 = c5, c5 = c3
WHERE c3 > c5;

-- swap (c3, c4);
UPDATE Foobar
SET c3 = c4, c4 = c3
WHERE c3 > c4;

-- swap (c1, c4);
UPDATE Foobar
SET c1 = c4, c4 = c1
WHERE c1 > c4;

-- swap (c1, c3);
UPDATE Foobar
SET c1 = c3, c3 = c1
WHERE c1 > c3;

-- swap (c2, c5);
UPDATE Foobar
SET c2 = c5, c5 = c2
WHERE c2 > c5;

-- swap (c2, c4);
UPDATE Foobar
SET c2 = c4, c4 = c2
WHERE c2 > c4;

-- swap (c2, c3);
UPDATE Foobar
SET c2 = c3, c3 = c2
WHERE c2 > c3;

END;

This fully portable, standard SQL code and it can be machine
generated. But that parallelism is useful. It is worthwhile to
combine some of the UPDATE statements. But you have to be careful not
to change the effective sequence of the swap operations.

If you look at the first two UPDATE statements, you can see that they
do not overlap. This means you could roll them into one statement
like this:

-- swap (c1, c2) AND swap (c4, c5);
UPDATE Foobar
SET c1 = CASE WHEN c1 <= c2 THEN c1 ELSE c2 END,
c2 = CASE WHEN c1 <= c2 THEN c2 ELSE c1 END,
c4 = CASE WHEN c4 <= c5 THEN c4 ELSE c5 END,
c5 = CASE WHEN c4 <= c5 THEN c5 ELSE c4 END
WHERE c4 > c5 OR c1 > c2;

The advantage of doing this is that you have to execute only one
UPDATE statement and not two. Updating a table, even on non-key
columns, usually locks the table and prevents other users from getting
to the data. If you could roll the statements into one single UPDATE,
you would have the best of all possible worlds, but I doubt that the
code would be easy to read.

=================
I also cannot stand the useless table names, so let me change them as
if human beings reading the code mattered.
-- since it has no key, it is not a table!! Not 1NF either!
CREATE TABLE PeopleSkills
(name VARCHAR(10) NOT NULL,
code1 INTEGER NOT NULL,
code2 INTEGER NOT NULL,
code3 INTEGER NOT NULL,
code4 INTEGER NOT NULL,
code5 INTEGER NOT NULL);

DELETE FROM PeopleSkills;
INSERT INTO PeopleSkills VALUES ('mary', 1, 7, 8, 9, 13);
INSERT INTO PeopleSkills VALUES ('mary', 1, 7, 8, 9, 13);
INSERT INTO PeopleSkills VALUES ('mary', 1, 7, 7, 7, 13);
INSERT INTO PeopleSkills VALUES ('mary', 1, 7, 8, 9, 13);
INSERT INTO PeopleSkills VALUES ('joe', 1, 7, 8, 9, 3);
INSERT INTO PeopleSkills VALUES ('bob', 1, 7, 8, 9, 3);
INSERT INTO PeopleSkills VALUES ('larry', 22, 17, 18, 19, 113); -- 0
target codes
INSERT INTO PeopleSkills VALUES ('mary', 1, 3, 2, 9, 13);
INSERT INTO PeopleSkills VALUES ('melvin', 1, 3, 2, 9, 13); -- 2
target codes
INSERT INTO PeopleSkills VALUES ('irving', 1, 8, 2, 9, 13); -- 1
target codes

CREATE TABLE TargetCodes
(code INTEGER NOT NULL PRIMARY KEY,
descript VARCHAR(50) NOT NULL);
INSERT INTO TargetCodes VALUES (1, 'code1')
INSERT INTO TargetCodes VALUES (3, 'code3')
INSERT INTO TargetCodes VALUES (7, 'code7')

I need to produce report:

name code count (of names that contain at least 1 code in Table2)<<

This will give you just the names:

SELECT DISTINCT name
FROM PeopleSkills AS P1
WHERE code1 IN (SELECT code FROM TargetCodes)
OR code2 IN (SELECT code FROM TargetCodes)
OR code3 IN (SELECT code FROM TargetCodes)
OR code4 IN (SELECT code FROM TargetCodes)
OR code5 IN (SELECT code FROM TargetCodes);

This modification will shown you which codes each person has, with 1/0
flags. This has aneat trick with little-used SUM(DISTINCT)
construction, but you have to know what the target codes are in
advance.

SELECT name,
SUM(DISTINCT CASE WHEN 1 IN (code1, code2, code3,code4,code5)
THEN 1 ELSE 0 END) AS code1,
SUM(DISTINCT CASE WHEN 3 IN (code1, code2, code3,code4,code5)
THEN 1 ELSE 0 END) AS code3,
SUM(DISTINCT CASE WHEN 7 IN (code1, code2, code3,code4,code5)
THEN 1 ELSE 0 END) AS code7
FROM PeopleSkills AS P1
WHERE code1 IN (SELECT code FROM TargetCodes)
OR code2 IN (SELECT code FROM TargetCodes)
OR code3 IN (SELECT code FROM TargetCodes)
OR code4 IN (SELECT code FROM TargetCodes)
OR code5 IN (SELECT code FROM TargetCodes)
GROUP BY name;

But I think you wanted a count of how many of the code appear in each
name, which is pretty easy:

SELECT P1.name, COUNT(DISTINCT T1.code)
FROM PeopleSkills AS P1,
TargetCodes AS T1
WHERE T1.code IN (code1, code2, code3, code4, code5)
GROUP BY name;

I would do it this way, to get the zeroes:
SELECT P1.name, COUNT(DISTINCT T1.code)
FROM PeopleSkills AS P1
LEFT OUTER JOIN
TargetCodes AS T1
ON T1.code IN (code1, code2, code3, code4, code5)
GROUP BY name;

Jul 20 '05 #5

Similar topics

Can you join a union

by: Tomas | last post by:

I would like to know if and how it is possible to join the *result* of a union with another table (without first doing individual joins to each part of the result that then will be combined into a...

MySQL Database

OUTER JOIN problem

by: Matt | last post by:

Hello I have to tables ar and arb, ar holds articles and a swedish description, arb holds descriptions in other languages. I want to retreive all articles that match a criteria from ar and...

Microsoft SQL Server

LEFT OUTER JOIN's possible in DB2?

by: Ian Boyd | last post by:

i know nothing about DB2, but i'm sure this must be possible. i'm trying to get a client to create a view (which it turns out is called a "Logical" in DB2). The query needs a LEFT OUTER JOIN, but...

DB2 Database

SQL92 Join and DB2

by: dunleav1 | last post by:

I have an application that uses the old join syntax instead of the SQL92 standards join syntax. I need to justify changing the code to the new standard. Is there any performance issue related to...

DB2 Database

Problem with Inner Join

by: Anila | last post by:

Hi Friends, My problem with Inner join is ... first i joined two tables and i got the result. after that iam trying to join one more table its giving syn tax error in JOIN condition. ...

Oracle Database

LEFT JOIN Optimization

by: polycom | last post by:

Assistance needed to optimize this query SELECT SD.content_id AS Id, SD.title AS Title, CT.name AS Contenttype, PV.content_id AS SponsorId, ...

MySQL Database

left outer join/inner join statement

by: teneesh | last post by:

Here I have a code for a view that has been created by a developer on my team. I am trying to use the very same code to create a view for a different formid/quesid. But I cannot figure out how this...

Microsoft SQL Server

Which is better(in terms of both memory & time): join or subquery

by: theGeek | last post by:

I always wonder which one of join or subquery should I be using to solve a particuar problem so that I get the correct result set in minimum time. It usually happens that I can write a query quickly...

Microsoft SQL Server

INNER JOIN & IS NOT NULL probs

by: MATTXtwo | last post by:

I have this store procedure to select data from table with join like this...SELECT tblPeribadi.Personel_No, tblPeribadi.Nama,tblCompany.Keterangan as Company_Code, tblPeribadi.Jawatan,...

Microsoft SQL Server

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General