>> SELECT [ID] FROM [test] WITH (NOLOCK) where [a/c/d]='a' GROUP BY
[unit #],[EFF DATE] HAVING COUNT ([unit #]) > 1 <<
Please post DDL, so that people do not have to guess what the keys,
constraints, Declarative Referential Integrity, datatypes, etc. in
your schema are.
The problem is that I get an error that [ID] needs to be in the
GROUP
BY clause or aggregate function. if I put it in there, I will get no
duplicates (because it is the identity field [sic]). <<
First of all, stop using IDENTITY as a key! That is almost certainly
how you got redundant duplicates in the first place. Read a book on
RDBMS and learn what a key is; you are imitating a sequential file in
SQL. Read a book on SQL, so your code will be closer to standard SQL,
too.
The whole point of this is to find dups. <<
SELECT MIN(id) AS "IDENTITY-caused screw up"
FROM Test
WHERE "a/c/d" = 'a'
GROUP BY "unit #", "eff date", ... -- rest of columns
HAVING COUNT ("unit #") > 1 ;
Here is how a SELECT works in SQL ... at least in theory. Real
products will optimize things when they can.
a) Start in the FROM clause and build a working table from all of the
joins, unions, intersections, and whatever other table constructors
are there. The table expression> AS <correlation name> option allows
you give a name to this working table which you then have to use for
the rest of the containing query.
b) Go to the WHERE clause and remove rows that do not pass criteria;
that is, that do not test to TRUE (reject UNKNOWN and FALSE). The
WHERE clause is applied to the working in the FROM clause.
c) Go to the optional GROUP BY clause, make groups and reduce each
group to a single row, replacing the original working table with the
new grouped table. The rows of a grouped table must be group
characteristics : (1) a grouping column (2) a statistic about the group
(i.e. aggregate functions) (3) a function or (4) an expression made up
of the those three items.
d) Go to the optional HAVING clause and apply it against the grouped
working table; if there was no GROUP BY clause, treat the entire table
as one group.
e) Go to the SELECT clause and construct the expressions in the list.
This means that the scalar subqueries, function calls and expressions
in the SELECT are done after all the other clauses are done. The AS
operator can give a name to expressions in the SELECT list, too.
These new names come into existence all at once, but after the WHERE
clause, GROUP BY clause and HAVING clause has been executed; you
cannot use them in the SELECT list or the WHERE clause for that
reason.
If there is a SELECT DISTINCT, then redundant duplicate rows are
removed. For purposes of defining a duplicate row, NULLs are treated
as matching (just like in the GROUP BY).
f) Nested query expressions follow the usual scoping rules you would
expect from a block structured language like C, Pascal, Algol, etc.
Namely, the innermost queries can reference columns and tables in the
queries in which they are contained.