By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
443,719 Members | 1,875 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 443,719 IT Pros & Developers. It's quick & easy.

Noob: Why does this Query work?

P: n/a
I'm wondering how/why this query works. Trying to get my head wrapped
around SQL. Basically the Query deletes from the Import table all
records that are already in FooStrings so that when I do an insert from
the FooStringsImport table into the FooStrings table, then I won't get
primary key violations.

DELETE FROM FooStringsImport
WHERE EXISTS
(SELECT * FROM FooStrings
WHERE FooStringsImport.FooKey = FooStrings.FooKey)

It seems to work fine, but I'm wondering about how the EXISTS keyword
works.

(SELECT * FROM FooStrings
WHERE FooStringsImport.FooKey = FooStrings.FooKey)
This part is going to return only records from FooStrings correct? Or
does it do a cartesian product since I've specified more than one table
in the WHERE statement?

I wonder if it only returns records in FooStrings, then I don't see how
a record from FooStringsImport would "EXISTS" in the records returned
from FooStrings.

The reason I wondered about the cartesian product is because, if only
FooStrings is specified in the FROM part of the SELECT statement, then
I was thinking it is only going to return FooString records. These
records would then be returned by the select statement to the WHERE
EXISTS, which would look for FooStringImport records, but would find
none because the select statement only returned FooString records.

I'm guessing maybe because it has to do a cartesian product to evaluate
the WHERE Pkey's equal, then the "SELECT *" just goes ahead and gets
ALL the fields, and not just those in FooStrings.

FooStrings and FooStringsImport are identically structured tables,
where the FooKey is set as the primary key in each table:

CREATE TABLE [dbo].[FooStrings] (
[FooKey] [bigint] NOT NULL ,
[Name] [char] (40) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[StartDate] [datetime] NULL ,
[EndDate] [datetime] NULL ,
[code] [char] (1) COLLATE SQL_Latin1_General_CP1_CI_AS NULL
) ON [PRIMARY]
GO

Thanks in advance. I'm so appreciative of the help I've gotten here,
as I've been able to write several very useful queries on my own now
after everyones help and plus lots of reading on my own.

Jul 23 '05 #1
Share this Question
Share on Google+
4 Replies


P: n/a
On 27 Apr 2005 10:54:43 -0700, sh******@cs.fsu.edu wrote:
I'm guessing maybe because it has to do a cartesian product to evaluate
the WHERE Pkey's equal, then the "SELECT *" just goes ahead and gets
ALL the fields, and not just those in FooStrings.


Close. The "WHERE EXISTS" is processed such that *no* columns will be
returned by the SELECT * inside it. Instead, it just notes whether any row
was returned and assigns true or false. You could get the same behavior by
replacing SELECT * with SELECT 1, or
SQRT(LEN(FooStringsImport.FooKey))*3.1415926535, or even SELECT NULL.
All of them will do exactly the same thing: completely ignore what comes
after the word SELECT.

Conceptually, the query processor is doing a separate "SELECT *" statement
for each and every row in FooStrings. If that separate statement returns a
row, it deletes the corresponding row from FooStrings; if it doesn't, then
it doesn't.

In actuality, the query optimizer doesn't do that. It rewrites the query as
if it were a JOIN (aka cartesian product, as you stated) and uses that to
decide which rows to delete from FooStringsImport.
Jul 23 '05 #2

P: n/a
It's called a Correlated Subquery. Theoretically you can assume that the
subquery is executed once for every row in the main (outer) part of the
statement. The correlation part is the reference to the outer table
(FooStringsImport.FooKey). In other words EXISTS returns true for any
particular row in FooStringsImport if there is at least one row in
FooStrings that matches FooStringsImport.FooKey.

This is a Selection operation, not a Join and a DELETE statement rather than
a query, so I don't see that it helps much to think in terms of a Cartesian
Product. In fact SQL Server may use join operations to execute statements
involving subqueries but the server is not going to join every row to every
row before deciding which rows to delete.

See also Joe Celko's narrative on how queries work:
http://www.google.co.uk/groups?selm=...oglegroups.com

By the way, NOT EXISTS is also possible. So rather than do a DELETE followed
by an INSERT why not just filter the INSERT statement in a similar way:

INSERT INTO FooStrings (...)
SELECT ...
FROM FooStringsImport
WHERE NOT EXISTS
(SELECT *
FROM FooStrings
WHERE FooStringsImport.FooKey = FooStrings.FooKey)

--
David Portas
SQL Server MVP
--
Jul 23 '05 #3

P: n/a
On 27 Apr 2005 10:54:43 -0700, sh******@cs.fsu.edu wrote:
I'm wondering how/why this query works. Trying to get my head wrapped
around SQL. Basically the Query deletes from the Import table all
records that are already in FooStrings so that when I do an insert from
the FooStringsImport table into the FooStrings table, then I won't get
primary key violations.

DELETE FROM FooStringsImport
WHERE EXISTS
(SELECT * FROM FooStrings
WHERE FooStringsImport.FooKey = FooStrings.FooKey)

It seems to work fine, but I'm wondering about how the EXISTS keyword
works.

(SELECT * FROM FooStrings
WHERE FooStringsImport.FooKey = FooStrings.FooKey)
This part is going to return only records from FooStrings correct? Or
does it do a cartesian product since I've specified more than one table
in the WHERE statement?

(snip)

Hi Shumaker,

This is called a correlated subquery. Correlated, because the
"FooStringsImport.FooKey" correlates it to the part of the query outside
the subquery.

The way that this will be executed (in theory - the optimizer usually has
some nifty trcks to get the same result faster, but the slow version is
easier to understand):

- For the DELETE statement, start with the first row of FooStringsImport;
- Replace "FooStringsImport.FooKey" in the subquery with the FooKey value
of that first row;
- Execute the subquery (which now only references the table used in the
FROM clause, making it a stand-alone query);
- Evaluate if the EXISTS predicate is true or false;
- If it is true, delete the first row of FooStringImport; if it's false,
retain it.
- Move to the second row of FooStringsImport;
- Replace "FooStringsImport.FooKey" in the subquery with the FooKey value
of that second row;
- Execute the subquery (which now only references the table used in the
FROM clause, making it a stand-alone query);
- Evaluate if the EXISTS predicate is true or false;
- If it is true, delete the second row of FooStringImport; if it's false,
retain it.
- Move to the third row of FooStringsImport;
(etc etc etc -- until all rows of FooStringImport have been processed).

(Note - the terms "first", "second", etc are used very loosely here, as a
table has no intrinsic order)

The main difference between correlated and non-correlated subqueries, is
that a non-correlated subquery only has to be evaluated once, and the
result set can be retained and reused during the complete query
evaluation, whereas ta correlated subquery has to be recomputed for each
row in the "surrounding" layer of the query.

Best, Hugo
--

(Remove _NO_ and _SPAM_ to get my e-mail address)
Jul 23 '05 #4

P: n/a
Thanks everyone! This is great info.

Jul 23 '05 #5

This discussion thread is closed

Replies have been disabled for this discussion.