wa**********@hotmail.com (Wanny) wrote in message news:<e5*************************@posting.google.c om>...
Hi Andre,
Thanks a lot for your help.
Your query did exactly what I needed.
I don't quite understand how the query works though and would
appreciate it if you could help me out further.
The bit I don't understand is, how does the query know to remove items
with a UserID less than the MAX(UserID).
We have the Join criteria
"on ToBeDeleted.UserID < ToBeKept.UserID"
but it doesn't specify anywhere that it's to be less than the
MAX(UserID) so how does it do it?
I have to apologise for my ignorance if the answer is obvious and
sorry for pestering you further.
Thanks.
So let's try to create something that might work a little better.
Since you didn't post any DDL I'm going to guess a little, but you'll
get the idea. I'll use a subselect which will will select those
userid's to be deleted (I'm keeping the userid's with the largest
value, under the assumption that those are the most recent ones)
create table user_details
(
UserID int not null,
Useremail varchar(255) not null
)
insert into user_details values
(1,'a**@def.gh')
insert into user_details values
(2,'i**@lmn.op')
insert into user_details values
(3,'q**@tuv.wx')
insert into user_details values
(4,'a**@def.gh')
insert into user_details values
(5,'i**@lmn.op')
insert into user_details values
(6,'f**@bar.org')
insert into user_details values
(7,'a**@def.gh')
DELETE FROM
user_details where userid in
(
select
Distinct ToBeDeleted.userid
FROM
user_details ToBeDeleted inner join user_details ToBeKept
on ToBeDeleted.UserID < ToBeKept.UserID and ToBeDeleted.UserEmail =
ToBeKept.UserEmail
)
With Regards
Andre Kuyt
A little further clarification seems to be in order, so lets move back
to the original example data and try this little query to see what it
actually is that it is selecting:
select * from user_details ToBeDeleted inner join user_details
ToBeKept
on ToBeDeleted.UserID < ToBeKept.UserID and ToBeDeleted.UserEmail =
ToBeKept.UserEmail
order by toBeDeleted.userid
This yields the following resultset
UserID UserEmail UserID Useremail
----------- ------------ ----------- ----------
1
ab*@def.gh 4
ab*@def.gh
1
ab*@def.gh 7
ab*@def.gh
2
ij*@lmn.op 5
ij*@lmn.op
4
ab*@def.gh 7
ab*@def.gh
So what we see is that the condition ToBeDeleted.UserID <
ToBeKept.UserID
actually results in at least one row in the resultset for each userid
which has at least one higher ranking userid with the same email
address.
UserID 1 actually appears twice, because it has 2 higher userids with
the same email address (4 and 7)
So what we know from this resultset is the following: If a userid
appears in the left most column, there is at least 1 userid with a
higher number in the table. As you can see for yourself UserID 7 is
excluded from the ToBeDeleted list because there is no larger userID
available. For that same reason all userid's without duplicate emails
are excluded from the ToBeDeleted results.
So if you remove all the rows with the ToBeDeleted UserID's from the
table you will be left with those records without duplicates which
have the Maximum userID
So, as you can see, the query itself actually knows nothing about a
MAX(UserID), which isn't surprising at all as queries have no brains
:)
(Advanced topic:
As a variation of this query you could use the join condition
ToBeDeleted.UserID > ToBeKept.UserID to keep all the minimum UserID's.
)
Andre