SQL to find short, fat and stupid people

Justin Hoffman

This is a question concerning query optimisation. Sorry if it's a bit long,
but thanks to anyone who has the patience to help - This is my first post
here...
If I have two tables: 'tblContact' and 'tblCategory' where categories are
like:
Code Name
010101 Short
010102 Fat
010103 Stupid

The junction table 'tblConCat' has fields CctConID, CctCatCode to tell me
which contacts have which category codes. These are nchar(6) fields, if it
makes any difference.

If I need to find all people who are short, fat and stupid I can see two
ways:

Solution One:
SELECT tblContact.* FROM tblContact WHERE
ConID IN (SELECT CctConID FROM tblConCat WHERE CctCatCode='010 101') AND
ConID IN (SELECT CctConID FROM tblConCat WHERE CctCatCode='010 102') AND
ConID IN (SELECT CctConID FROM tblConCat WHERE CctCatCode='010 103')
Solution Two:
Build a helper table which contains the codes I'm looking for

SELECT tblContact.* FROM tblContact
WHERE ConID IN
(SELECT CctConID
FROM tblConCat INNER JOIN tblHelper
ON tblConCat.CctCa tCode = tblHelper.HlpCa tCode
GROUP BY CctConID HAVING Count(*)=3)
I have tried them both out and although I looked at the query analyzer it
provided more information than I knew what to do with. In practise they
both provide similar working times (I wait about a second) but I thought the
first looked rather inefficient and thought my helper table might help.
There are about 30,000 contact records, 180 category records and 120,000
junction table records.

All I am looking for comments on any pros and cons of these two approaches -
does one look that bad? The database was migrated from Access where the
helper table really did help, but using SQL Server I might not need it.

Thanks again, if you got this far!

Jul 23 '05 #1

Subscribe Reply

2079

Dan Guzman

> I have tried them both out and although I looked at the query analyzer it

provided more information than I knew what to do with. In practise they
both provide similar working times
You can compare the estimated costs by running your queries at the same time
in Query Analyzer with the show execution plan option on. However, I would
still choose the query with the lowest actual execution time. All things
being equal, select the simplest approach.

BTW, make sure you have a unique index on tblConCat (CctConID and
CctCatCode). A primary key, unique constraint or unique index will provide
this.

Untested Solution Three (there are many more):

SELECT tblContact.*
FROM tblContact c
JOIN tblConCat cc1 ON
c.ConID = cc1.CctConID AND
CctCatCode='010 101'
JOIN tblConCat cc2 ON
c.ConID = cc2.CctConID AND
CctCatCode='010 102'
JOIN tblConCat cc3 ON
c.ConID = cc3.CctConID AND
CctCatCode='010 103'

--
Hope this helps.

Dan Guzman
SQL Server MVP

"Justin Hoffman" <j@b.com> wrote in message
news:d4******** **@nwrdmz03.dmz .ncs.ea.ibs-infra.bt.com...
This is a question concerning query optimisation. Sorry if it's a bit
long, but thanks to anyone who has the patience to help - This is my first
post here...
If I have two tables: 'tblContact' and 'tblCategory' where categories are
like:
Code Name
010101 Short
010102 Fat
010103 Stupid

The junction table 'tblConCat' has fields CctConID, CctCatCode to tell me
which contacts have which category codes. These are nchar(6) fields, if
it makes any difference.

If I need to find all people who are short, fat and stupid I can see two
ways:

Solution One:
SELECT tblContact.* FROM tblContact WHERE
ConID IN (SELECT CctConID FROM tblConCat WHERE CctCatCode='010 101') AND
ConID IN (SELECT CctConID FROM tblConCat WHERE CctCatCode='010 102') AND
ConID IN (SELECT CctConID FROM tblConCat WHERE CctCatCode='010 103')
Solution Two:
Build a helper table which contains the codes I'm looking for

SELECT tblContact.* FROM tblContact
WHERE ConID IN
(SELECT CctConID
FROM tblConCat INNER JOIN tblHelper
ON tblConCat.CctCa tCode = tblHelper.HlpCa tCode
GROUP BY CctConID HAVING Count(*)=3)
I have tried them both out and although I looked at the query analyzer it
provided more information than I knew what to do with. In practise they
both provide similar working times (I wait about a second) but I thought
the first looked rather inefficient and thought my helper table might
help. There are about 30,000 contact records, 180 category records and
120,000 junction table records.

All I am looking for comments on any pros and cons of these two
approaches - does one look that bad? The database was migrated from
Access where the helper table really did help, but using SQL Server I
might not need it.

Thanks again, if you got this far!

Jul 23 '05 #2

Justin Hoffman

"Dan Guzman" <gu******@nospa m-online.sbcgloba l.net> wrote in message
news:vb******** ********@newssv r11.news.prodig y.com...

I have tried them both out and although I looked at the query analyzer it
provided more information than I knew what to do with. In practise they
both provide similar working times

You can compare the estimated costs by running your queries at the same
time in Query Analyzer with the show execution plan option on. However, I
would still choose the query with the lowest actual execution time. All
things being equal, select the simplest approach.

BTW, make sure you have a unique index on tblConCat (CctConID and
CctCatCode). A primary key, unique constraint or unique index will
provide this.

Untested Solution Three (there are many more):

SELECT tblContact.*
FROM tblContact c
JOIN tblConCat cc1 ON
c.ConID = cc1.CctConID AND
CctCatCode='010 101'
JOIN tblConCat cc2 ON
c.ConID = cc2.CctConID AND
CctCatCode='010 102'
JOIN tblConCat cc3 ON
c.ConID = cc3.CctConID AND
CctCatCode='010 103'

--
Hope this helps.

Dan Guzman
SQL Server MVP

"Justin Hoffman" <j@b.com> wrote in message
news:d4******** **@nwrdmz03.dmz .ncs.ea.ibs-infra.bt.com...

This is a question concerning query optimisation. Sorry if it's a bit
long, but thanks to anyone who has the patience to help - This is my
first post here...
If I have two tables: 'tblContact' and 'tblCategory' where categories are
like:
Code Name
010101 Short
010102 Fat
010103 Stupid

The junction table 'tblConCat' has fields CctConID, CctCatCode to tell me
which contacts have which category codes. These are nchar(6) fields, if
it makes any difference.

If I need to find all people who are short, fat and stupid I can see two
ways:

Solution One:
SELECT tblContact.* FROM tblContact WHERE
ConID IN (SELECT CctConID FROM tblConCat WHERE CctCatCode='010 101') AND
ConID IN (SELECT CctConID FROM tblConCat WHERE CctCatCode='010 102') AND
ConID IN (SELECT CctConID FROM tblConCat WHERE CctCatCode='010 103')
Solution Two:
Build a helper table which contains the codes I'm looking for

SELECT tblContact.* FROM tblContact
WHERE ConID IN
(SELECT CctConID
FROM tblConCat INNER JOIN tblHelper
ON tblConCat.CctCa tCode = tblHelper.HlpCa tCode
GROUP BY CctConID HAVING Count(*)=3)
I have tried them both out and although I looked at the query analyzer it
provided more information than I knew what to do with. In practise they
both provide similar working times (I wait about a second) but I thought
the first looked rather inefficient and thought my helper table might
help. There are about 30,000 contact records, 180 category records and
120,000 junction table records.

All I am looking for comments on any pros and cons of these two
approaches - does one look that bad? The database was migrated from
Access where the helper table really did help, but using SQL Server I
might not need it.

Thanks again, if you got this far!

Hi Dan
Thank you for your comments. The SQL needed a couple of adjustments: the
CctCatCode fields needed a table prefix to make them unambiguous and also
(probable typo) I need (select c.*) rather than (select tblContact.*) So
the final version is:

SELECT c.*
FROM tblContact c
JOIN tblConCat cc1 ON
c.ConID = cc1.CctConID AND
cc1.CctCatCode= '010101'
JOIN tblConCat cc2 ON
c.ConID = cc2.CctConID AND
cc2.CctCatCode= '010102'
JOIN tblConCat cc3 ON
c.ConID = cc3.CctConID AND
cc3.CctCatCode= '010103'

Anyway, the result again is pretty fast and I guess I am wasting my time
building a helper table if it doesn't noticeably increase the speed but does
increase the complexity (the helper tables need to be dynamically created in
a multi-user environment).
I looked at the execution plans, but it gives so much information (cost,
subtree cost, etc) that it leaves a simple person like me somewhat lost.
What I would like to know is 'which is better: A or B?' Even when I run
them, I am not sure how to compare the actual performance. Is there one key
number to look at? Perhaps 'Cumulative client processing time'? Perhaps I
should try to read up on this.
(PS I do have that index you mentioned)

Jul 23 '05 #3

Dan Guzman

> What I would like to know is 'which is better: A or B?' Even when I run

them, I am not sure how to compare the actual performance. Is there one
key number to look at?
I usually compare actual query duration in a controlled test environment
with no other activity and clean cache.

CHECKPOINT
DBCC DROPCLEANBUFFER S
DBCC FREEPROCCACHE
GO
SELECT GETDATE() AS StartTime
GO
--execute first query
GO
SELECT GETDATE() AS EndTime
GO

CHECKPOINT
DBCC DROPCLEANBUFFER S
DBCC FREEPROCCACHE
GO
SELECT GETDATE() AS StartTime
GO
--execute second query
GO
SELECT GETDATE() AS EndTime
GO

--
Hope this helps.

Dan Guzman
SQL Server MVP

"Justin Hoffman" <j@b.com> wrote in message
news:d4******** **@nwrdmz01.dmz .ncs.ea.ibs-infra.bt.com...
"Dan Guzman" <gu******@nospa m-online.sbcgloba l.net> wrote in message
news:vb******** ********@newssv r11.news.prodig y.com...
I have tried them both out and although I looked at the query analyzer
it provided more information than I knew what to do with. In practise
they both provide similar working times

You can compare the estimated costs by running your queries at the same
time in Query Analyzer with the show execution plan option on. However,
I would still choose the query with the lowest actual execution time.
All things being equal, select the simplest approach.

BTW, make sure you have a unique index on tblConCat (CctConID and
CctCatCode). A primary key, unique constraint or unique index will
provide this.

Untested Solution Three (there are many more):

SELECT tblContact.*
FROM tblContact c
JOIN tblConCat cc1 ON
c.ConID = cc1.CctConID AND
CctCatCode='010 101'
JOIN tblConCat cc2 ON
c.ConID = cc2.CctConID AND
CctCatCode='010 102'
JOIN tblConCat cc3 ON
c.ConID = cc3.CctConID AND
CctCatCode='010 103'

--
Hope this helps.

Dan Guzman
SQL Server MVP

"Justin Hoffman" <j@b.com> wrote in message
news:d4******** **@nwrdmz03.dmz .ncs.ea.ibs-infra.bt.com...

This is a question concerning query optimisation. Sorry if it's a bit
long, but thanks to anyone who has the patience to help - This is my
first post here...
If I have two tables: 'tblContact' and 'tblCategory' where categories
are like:
Code Name
010101 Short
010102 Fat
010103 Stupid

The junction table 'tblConCat' has fields CctConID, CctCatCode to tell
me which contacts have which category codes. These are nchar(6) fields,
if it makes any difference.

If I need to find all people who are short, fat and stupid I can see two
ways:

Solution One:
SELECT tblContact.* FROM tblContact WHERE
ConID IN (SELECT CctConID FROM tblConCat WHERE CctCatCode='010 101') AND
ConID IN (SELECT CctConID FROM tblConCat WHERE CctCatCode='010 102') AND
ConID IN (SELECT CctConID FROM tblConCat WHERE CctCatCode='010 103')
Solution Two:
Build a helper table which contains the codes I'm looking for

SELECT tblContact.* FROM tblContact
WHERE ConID IN
(SELECT CctConID
FROM tblConCat INNER JOIN tblHelper
ON tblConCat.CctCa tCode = tblHelper.HlpCa tCode
GROUP BY CctConID HAVING Count(*)=3)
I have tried them both out and although I looked at the query analyzer
it provided more information than I knew what to do with. In practise
they both provide similar working times (I wait about a second) but I
thought the first looked rather inefficient and thought my helper table
might help. There are about 30,000 contact records, 180 category records
and 120,000 junction table records.

All I am looking for comments on any pros and cons of these two
approaches - does one look that bad? The database was migrated from
Access where the helper table really did help, but using SQL Server I
might not need it.

Thanks again, if you got this far!

Hi Dan
Thank you for your comments. The SQL needed a couple of adjustments: the
CctCatCode fields needed a table prefix to make them unambiguous and also
(probable typo) I need (select c.*) rather than (select tblContact.*) So
the final version is:

SELECT c.*
FROM tblContact c
JOIN tblConCat cc1 ON
c.ConID = cc1.CctConID AND
cc1.CctCatCode= '010101'
JOIN tblConCat cc2 ON
c.ConID = cc2.CctConID AND
cc2.CctCatCode= '010102'
JOIN tblConCat cc3 ON
c.ConID = cc3.CctConID AND
cc3.CctCatCode= '010103'

Anyway, the result again is pretty fast and I guess I am wasting my time
building a helper table if it doesn't noticeably increase the speed but
does increase the complexity (the helper tables need to be dynamically
created in a multi-user environment).
I looked at the execution plans, but it gives so much information (cost,
subtree cost, etc) that it leaves a simple person like me somewhat lost.
What I would like to know is 'which is better: A or B?' Even when I run
them, I am not sure how to compare the actual performance. Is there one
key number to look at? Perhaps 'Cumulative client processing time'?
Perhaps I should try to read up on this.
(PS I do have that index you mentioned)

Jul 23 '05 #4

Justin Hoffman

"Dan Guzman" <gu******@nospa m-online.sbcgloba l.net> wrote in message
news:Cc******** ********@newssv r11.news.prodig y.com...

What I would like to know is 'which is better: A or B?' Even when I run
them, I am not sure how to compare the actual performance. Is there one
key number to look at?

I usually compare actual query duration in a controlled test environment
with no other activity and clean cache.

CHECKPOINT
DBCC DROPCLEANBUFFER S
DBCC FREEPROCCACHE
GO
SELECT GETDATE() AS StartTime
GO
--execute first query
GO
SELECT GETDATE() AS EndTime
GO

CHECKPOINT
DBCC DROPCLEANBUFFER S
DBCC FREEPROCCACHE
GO
SELECT GETDATE() AS StartTime
GO
--execute second query
GO
SELECT GETDATE() AS EndTime
GO

--
Hope this helps.

Dan Guzman
SQL Server MVP

Thanks, Dan. I'll put that in my 'useful things to know' file.

Jul 23 '05 #5

--CELKO--

This is a "relational division"; it is one of Codd's original
operations. You might also want to look up the proper way to name data
elements, so you will not have those silly "tbl-" prefixes and multiple
suffixes. And rows are not records.

Relational division is one of the eight basic operations in Codd's
relational algebra. The idea is that a divisor table is used to
partition a dividend table and produce a quotient or results table.
The quotient table is made up of those values of one column for which a
second column had all of the values in the divisor.

This is easier to explain with an example. We have a table of pilots
and the planes they can fly (dividend); we have a table of planes in
the hangar (divisor); we want the names of the pilots who can fly every
plane (quotient) in the hangar. To get this result, we divide the
PilotSkills table by the planes in the hangar.

CREATE TABLE PilotSkills
(pilot CHAR(15) NOT NULL,
plane CHAR(15) NOT NULL,
PRIMARY KEY (pilot, plane));

PilotSkills
pilot plane
=============== ==========
'Celko' 'Piper Cub'
'Higgins' 'B-52 Bomber'
'Higgins' 'F-14 Fighter'
'Higgins' 'Piper Cub'
'Jones' 'B-52 Bomber'
'Jones' 'F-14 Fighter'
'Smith' 'B-1 Bomber'
'Smith' 'B-52 Bomber'
'Smith' 'F-14 Fighter'
'Wilson' 'B-1 Bomber'
'Wilson' 'B-52 Bomber'
'Wilson' 'F-14 Fighter'
'Wilson' 'F-17 Fighter'

CREATE TABLE Hangar
(plane CHAR(15) NOT NULL PRIMARY KEY);

Hangar
plane
=============
'B-1 Bomber'
'B-52 Bomber'
'F-14 Fighter'

PilotSkills DIVIDED BY Hangar
pilot
=============== ==============
'Smith'
'Wilson'

In this example, Smith and Wilson are the two pilots who can fly
everything in the hangar. Notice that Higgins and Celko know how to
fly a Piper Cub, but we don't have one right now. In Codd's original
definition of relational division, having more rows than are called for
is not a problem.

The important characteristic of a relational division is that the CROSS
JOIN (Cartesian product) of the divisor and the quotient produces a
valid subset of rows from the dividend. This is where the name comes
from, since the CROSS JOIN acts like a multiplication operator.

Division with a Remainder

There are two kinds of relational division. Division with a remainder
allows the dividend table to have more values than the divisor, which
was Codd's original definition. For example, if a pilot can fly more
planes than just those we have in the hangar, this is fine with us.
The query can be written in SQL-89 as

SELECT DISTINCT pilot
FROM PilotSkills AS PS1
WHERE NOT EXISTS
(SELECT *
FROM Hangar
WHERE NOT EXISTS
(SELECT *
FROM PilotSkills AS PS2
WHERE (PS1.pilot = PS2.pilot)
AND (PS2.plane = Hangar.plane))) ;

The quickest way to explain what is happening in this query is to
imagine an old World War II movie where a cocky pilot has just walked
into the hangar, looked over the fleet, and announced, "There ain't no
plane in this hangar that I can't fly!" We are finding the pilots for
whom there does not exist a plane in the hangar for which they have no
skills. The use of the NOT EXISTS() predicates is for speed. Most SQL
systems will look up a value in an index rather than scan the whole
table. The SELECT * clause lets the query optimizer choose the column
to use when looking for the index.

This query for relational division was made popular by Chris Date in
his textbooks, but it is not the only method nor always the fastest.
Another version of the division can be written so as to avoid three
levels of nesting. While it is not original with me, I have made it
popular in my books.

SELECT PS1.pilot
FROM PilotSkills AS PS1, Hangar AS H1
WHERE PS1.plane = H1.plane
GROUP BY PS1.pilot
HAVING COUNT(PS1.plane ) = (SELECT COUNT(plane) FROM Hangar);

There is a serious difference in the two methods. Burn down the
hangar, so that the divisor is empty. Because of the NOT EXISTS()
predicates in Date's query, all pilots are returned from a division by
an empty set. Because of the COUNT() functions in my query, no pilots
are returned from a division by an empty set.

In the sixth edition of his book, INTRODUCTION TO DATABASE SYSTEMS
(Addison-Wesley; 1995 ;ISBN 0-201-82458-2), Chris Date defined another
operator (DIVIDEBY ... PER) which produces the same results as my
query, but with more complexity.

Exact Division

The second kind of relational division is exact relational division.
The dividend table must match exactly to the values of the divisor
without any extra values.

SELECT PS1.pilot
FROM PilotSkills AS PS1
LEFT OUTER JOIN
Hangar AS H1
ON PS1.plane = H1.plane
GROUP BY PS1.pilot
HAVING COUNT(PS1.plane ) = (SELECT COUNT(plane) FROM Hangar)
AND COUNT(H1.plane) = (SELECT COUNT(plane) FROM Hangar);

This says that a pilot must have the same number of certificates as
there planes in the hangar and these certificates all match to a plane
in the hangar, not something else. The "something else" is shown by a
created NULL from the LEFT OUTER JOIN.

Please do not make the mistake of trying to reduce the HAVING clause
with a little algebra to:

HAVING COUNT(PS1.plane ) = COUNT(H1.plane)

because it does not work; it will tell you that the hangar has (n)
planes in it and the pilot is certified for (n) planes, but not that
those two sets of planes are equal to each other.

Note on Performance

The nested EXISTS() predicates version of relational division was made
popular by Chris Date's textbooks, while the author is associated with
popularizing the COUNT(*) version of relational division. The Winter
1996 edition of DB2 ON-LINE MAGAZINE
(http://www.db2mag.com/96011ar:htm) had an article entitled "Powerful
SQL:Beyond the Basics" by Sheryl Larsen which gave the results of
testing both methods. Her conclusion for DB2 was that the nested
EXISTS() version is better when the quotient has less than 25% of the
dividend table's rows and the COUNT(*) version is better when the
quotient is more than 25% of the dividend table.

Jul 23 '05 #6

Justin Hoffman

"--CELKO--" <jc*******@eart hlink.net> wrote in message
news:11******** **************@ g14g2000cwa.goo glegroups.com.. .

This is a "relational division"; it is one of Codd's original
operations. You might also want to look up the proper way to name data
elements, so you will not have those silly "tbl-" prefixes and multiple
suffixes. And rows are not records.

Relational division is one of the eight basic operations in Codd's
relational algebra. The idea is that a divisor table is used to
partition a dividend table and produce a quotient or results table.
The quotient table is made up of those values of one column for which a
second column had all of the values in the divisor.

This is easier to explain with an example. We have a table of pilots
and the planes they can fly (dividend); we have a table of planes in
the hangar (divisor); we want the names of the pilots who can fly every
plane (quotient) in the hangar. To get this result, we divide the
PilotSkills table by the planes in the hangar.

CREATE TABLE PilotSkills
(pilot CHAR(15) NOT NULL,
plane CHAR(15) NOT NULL,
PRIMARY KEY (pilot, plane));

PilotSkills
pilot plane
=============== ==========
'Celko' 'Piper Cub'
'Higgins' 'B-52 Bomber'
'Higgins' 'F-14 Fighter'
'Higgins' 'Piper Cub'
'Jones' 'B-52 Bomber'
'Jones' 'F-14 Fighter'
'Smith' 'B-1 Bomber'
'Smith' 'B-52 Bomber'
'Smith' 'F-14 Fighter'
'Wilson' 'B-1 Bomber'
'Wilson' 'B-52 Bomber'
'Wilson' 'F-14 Fighter'
'Wilson' 'F-17 Fighter'

CREATE TABLE Hangar
(plane CHAR(15) NOT NULL PRIMARY KEY);

Hangar
plane
=============
'B-1 Bomber'
'B-52 Bomber'
'F-14 Fighter'

PilotSkills DIVIDED BY Hangar
pilot
=============== ==============
'Smith'
'Wilson'

In this example, Smith and Wilson are the two pilots who can fly
everything in the hangar. Notice that Higgins and Celko know how to
fly a Piper Cub, but we don't have one right now. In Codd's original
definition of relational division, having more rows than are called for
is not a problem.

The important characteristic of a relational division is that the CROSS
JOIN (Cartesian product) of the divisor and the quotient produces a
valid subset of rows from the dividend. This is where the name comes
from, since the CROSS JOIN acts like a multiplication operator.

Division with a Remainder

There are two kinds of relational division. Division with a remainder
allows the dividend table to have more values than the divisor, which
was Codd's original definition. For example, if a pilot can fly more
planes than just those we have in the hangar, this is fine with us.
The query can be written in SQL-89 as

SELECT DISTINCT pilot
FROM PilotSkills AS PS1
WHERE NOT EXISTS
(SELECT *
FROM Hangar
WHERE NOT EXISTS
(SELECT *
FROM PilotSkills AS PS2
WHERE (PS1.pilot = PS2.pilot)
AND (PS2.plane = Hangar.plane))) ;

The quickest way to explain what is happening in this query is to
imagine an old World War II movie where a cocky pilot has just walked
into the hangar, looked over the fleet, and announced, "There ain't no
plane in this hangar that I can't fly!" We are finding the pilots for
whom there does not exist a plane in the hangar for which they have no
skills. The use of the NOT EXISTS() predicates is for speed. Most SQL
systems will look up a value in an index rather than scan the whole
table. The SELECT * clause lets the query optimizer choose the column
to use when looking for the index.

This query for relational division was made popular by Chris Date in
his textbooks, but it is not the only method nor always the fastest.
Another version of the division can be written so as to avoid three
levels of nesting. While it is not original with me, I have made it
popular in my books.

SELECT PS1.pilot
FROM PilotSkills AS PS1, Hangar AS H1
WHERE PS1.plane = H1.plane
GROUP BY PS1.pilot
HAVING COUNT(PS1.plane ) = (SELECT COUNT(plane) FROM Hangar);

There is a serious difference in the two methods. Burn down the
hangar, so that the divisor is empty. Because of the NOT EXISTS()
predicates in Date's query, all pilots are returned from a division by
an empty set. Because of the COUNT() functions in my query, no pilots
are returned from a division by an empty set.

In the sixth edition of his book, INTRODUCTION TO DATABASE SYSTEMS
(Addison-Wesley; 1995 ;ISBN 0-201-82458-2), Chris Date defined another
operator (DIVIDEBY ... PER) which produces the same results as my
query, but with more complexity.

Exact Division

The second kind of relational division is exact relational division.
The dividend table must match exactly to the values of the divisor
without any extra values.

SELECT PS1.pilot
FROM PilotSkills AS PS1
LEFT OUTER JOIN
Hangar AS H1
ON PS1.plane = H1.plane
GROUP BY PS1.pilot
HAVING COUNT(PS1.plane ) = (SELECT COUNT(plane) FROM Hangar)
AND COUNT(H1.plane) = (SELECT COUNT(plane) FROM Hangar);

This says that a pilot must have the same number of certificates as
there planes in the hangar and these certificates all match to a plane
in the hangar, not something else. The "something else" is shown by a
created NULL from the LEFT OUTER JOIN.

Please do not make the mistake of trying to reduce the HAVING clause
with a little algebra to:

HAVING COUNT(PS1.plane ) = COUNT(H1.plane)

because it does not work; it will tell you that the hangar has (n)
planes in it and the pilot is certified for (n) planes, but not that
those two sets of planes are equal to each other.

Note on Performance

The nested EXISTS() predicates version of relational division was made
popular by Chris Date's textbooks, while the author is associated with
popularizing the COUNT(*) version of relational division. The Winter
1996 edition of DB2 ON-LINE MAGAZINE
(http://www.db2mag.com/96011ar:htm) had an article entitled "Powerful
SQL:Beyond the Basics" by Sheryl Larsen which gave the results of
testing both methods. Her conclusion for DB2 was that the nested
EXISTS() version is better when the quotient has less than 25% of the
dividend table's rows and the COUNT(*) version is better when the
quotient is more than 25% of the dividend table.

Thank you CELKO. Was this an answer specifically for me, or a cut and paste
from elsewhere? Apart from the specific advice that my naming convention is
silly, do you have any suggested SQL for the problem in hand? I can't quite
make the jump from 'which pilots can fly all the planes in the hanger' to
'which contacts are short, fat and stupid'. I do already have two solutions
and Dan Guzman suggested a third and all of these work. The question is
really about optimizing the SQL now that the database has moved from Access
to SQL Server.

Jul 23 '05 #7

Ross Presser

On Fri, 22 Apr 2005 15:54:31 +0000 (UTC), Justin Hoffman wrote:

Thank you CELKO. Was this an answer specifically for me, or a cut and paste
from elsewhere?
It is his standard Relational Division piece.
Apart from the specific advice that my naming convention is
silly, do you have any suggested SQL for the problem in hand? I can't quite
make the jump from 'which pilots can fly all the planes in the hanger' to
'which contacts are short, fat and stupid'. I do already have two solutions
and Dan Guzman suggested a third and all of these work. The question is
really about optimizing the SQL now that the database has moved from Access
to SQL Server.

In fact, Celko's preferred Relational Division style amounts to the same
thing as your second solution, except that no "helper table" is really
necessary. Here is my take on it:

SELECT C.* FROM tblContact C
INNER JOIN tblConCat CC
ON C.ConID = CC.CctConID
WHERE CC.CctCatCode IN ('010101', '010102', '010103')
GROUP BY C.ConID
HAVING COUNT(*)=3

Now say you don't want to limit it to just those three categories. Suppose
your tblCategory looks like this:

CREATE TABLE tblCategory (
CatCode char(6),
CatName varchar(20),
InsultLevel int
)

INSERT INTO tblCategory VALUES ('010101', 'Short', 1)
INSERT INTO tblCategory VALUES ('010102', 'Fat', 2)
INSERT INTO tblCategory VALUES ('010103', 'Stupid', 2)
INSERT INTO tblCategory VALUES ('010104', 'Spanish', 0)
INSERT INTO tblCategory VALUES ('010105', 'Unemployable', 1)

And you want to find all the contacts who match all categories that have an
InsultLevel of at least 1. You don't have to know ahead of time what
categories those are, or even how many categories those are:

SELECT C.* FROM tblContact C
INNER JOIN tblConCat CC
ON C.ConID = CC.CctConID
INNER JOIN tblCategory CG
ON CC.CctCatCode = CG.CatCode
WHERE CG.InsultLevel >= 1
GROUP BY C.ConID
HAVING COUNT(*)=(
SELECT COUNT(*) FROM Category
WHERE Category.Insult Level >= 1
)

Make sense?

Jul 23 '05 #8

Hugo Kornelis

On 22 Apr 2005 08:05:29 -0700, --CELKO-- wrote:

(snip)

Exact Division (snip)SELECT PS1.pilot
FROM PilotSkills AS PS1
LEFT OUTER JOIN
Hangar AS H1
ON PS1.plane = H1.plane
GROUP BY PS1.pilot
HAVING COUNT(PS1.plane ) = (SELECT COUNT(plane) FROM Hangar)
AND COUNT(H1.plane) = (SELECT COUNT(plane) FROM Hangar); (snip)Please do not make the mistake of trying to reduce the HAVING clause
with a little algebra to:

HAVING COUNT(PS1.plane ) = COUNT(H1.plane)

because it does not work;

Hi Joe,

But (correct me if I'm wrong) there is nothing wrong with reducing the
HAVING clause to

HAVING COUNT(PS1.plane ) = (SELECT COUNT(plane) FROM Hangar)
AND COUNT(H1.plane) = COUNT(PS1.plane );

Right?

A clever optimizer would of course not perform the same subquery twice,
but I'm not too convinced that all optimizers are that clever...

(In fact, I just tested both on your sample data, and the execution plan
shows that SQL Server 2000 will indeed perform the same COUNT subquery
twice - at least with this small amount of rows. Maybe it will cough up
a better plan when there are a few million rows involved?)

Best, Hugo
--

(Remove _NO_ and _SPAM_ to get my e-mail address)

Jul 23 '05 #9

--CELKO--

>> just tested both on your sample data, and the execution plan shows
that SQL Server 2000 will indeed perform the same COUNT subquery twice
- at least with this small amount of rows. <<

That is awful! That is a matter of a parser finding a deterministic
common subexpression, not something hard.

Maybe it will cough up a better plan when there are a few million

rows involved? <<

I don't know, but it looks like the optimizer is just plain stupid in
the HAVING clause.

Jul 23 '05 #10

Similar topics

119

4607

beta2 XHTML compliance? is this necessary. OR STUPID...read UP MICROSOFT

by: rhat | last post by:

I heard that beta 2 now makes ASP.NET xhtml compliant. Can anyone shed some light on what this will change and it will break stuff as converting HTML to XHTML pages DO break things. see, http://www.alistapart.com/articles/betterliving/ I read on http://msdn.microsoft.com/netframework/default.aspx?pull=/library/en-us/dnnetdep/html/netfxcompat.asp It said they changed stuff like this

.NET Framework

2565

Where to find python c-sources

by: Tor Erik Sønvisen | last post by:

Hi I need to browse the socket-module source-code. I believe it's contained in the file socketmodule.c, but I can't locate this file... Where should I look? regards tores

Python

24197

How to find an end of file

by: junky_fellow | last post by:

What is the proper way of finding an end of file condition while reading a file ? How does feof() detects that end of file is reached ? Does this require support from OS ? Thanx in advance for any help ...

C / C++

5128

How to find reliable offshore (India) programming shop? (this is not aspam)

by: Bret Pehrson | last post by:

This message isn't spam or an advertisement or trolling. I'm considering farming some of my application development to offshore shops (I'm in the US). I have absolutely *no* experience w/ this, and therefore I'm looking for comments, suggestions, etc. on how to go about this w/o getting screwed. My current application development is primarily database-driven apps in C++/C#, so I'm looking for programmers w/ up-to-date skills.

C# / C Sharp

2433

more and more excel to access....and the inputs are really stupid

by: sparks | last post by:

We get more and more data done in excel and then they want it imported into access. The data is just stupid....values of 1 to 5 we get a lot of 0's ok that alright but 1-jan ? we get colums that are formatted for number and then half way down they are changed to text. OR the famous ok now everything in red is ---- and everything in blue is---------. WTF are these people thinking?

Microsoft Access / VBA

1831

how to find out the address of a variable after I compile the C file

by: bijayadipti | last post by:

Hi, I have a C program. I have compiled it uisng gcc and also avr-gcc. Now after compiling, I want to know the addresses of the two variables in my program. Is there any options that I can use to find that out? Is there any way at all to find that out? Someone told that it was possible but I am not able to find out. Thank you in advance, priya

C / C++

13631

How To Find The Plan For A Program

by: OppThumb | last post by:

Hi, I've been searching this newsgroup for an answer to my question, and the closest I've come asks my question, but in reverse ("How to figure out the program from plan/package"). I've -- shall we say, inherited? -- a COBOL program with very little documentation that I've recompiled for debugging purposes. The compile/link/bind have all been done, but nothing in the output tells me what plan the program has been bound to, so I can't...

DB2 Database

2677

Cant find item in server side cache

by: moondaddy | last post by:

I had to repost this because I had to update and change my msdn alias. I will re-ask the question and clarify a few things that were not clear before. This code is all executed on my dev machine running winXP sp2 and VS2005. I'm using a c# 2.0 winforms app which talks to a c#2.0 asp.net app that also contains 1 web service. Note: the webpage and web service are located side by side in the same web app.

ASP.NET

8913

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...

General

8761

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...

Windows Server

9426

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...

C / C++

9200

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...

Windows Server

9142

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...

General

8144

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...

Career Advice

6016

Couldn’t get equations in html when convert word .docx file to html file in C#.

by: conductexam | last post by:

I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...

C# / C Sharp

4525

Trying to create a lan-to-lan vpn between two differents networks

by: TSSRALBI | last post by:

Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...

Networking - Hardware / Configuration

4795

Windows Forms - .Net 8.0

by: adsilva | last post by:

A Windows Forms form does not have the event Unload, like VB6. What one acts like?

Visual Basic .NET