By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
431,919 Members | 1,625 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 431,919 IT Pros & Developers. It's quick & easy.

delete otherwise duplicate records based on differing values in one column

P: 6
I have a table with what I consider duplicate records. Data in all columns are duplicate except for the date column, meaning that duplicate data was entered on different dates and those dates were stored along with the data.

I want to delete the records except for most recent one. I can select the client ID field and the max(date field) to determine the ones I want to keep, but how do I determine the ones to delete. There are often more than two duplicates, so the min(date field) doesn't do it.

Any suggestions or guidance will be appreciated!

I use:
Microsoft SQL Enterprise Manager 8.0 running on Windows XP SP2
Feb 14 '08 #1
Share this Question
Share on Google+
7 Replies


Delerna
Expert 100+
P: 1,134
Paste this code into query analyser and see if it does what you need

Expand|Select|Wrap|Line Numbers
  1. --Setup a table and add some data for rhe example to work with
  2. create table tblDulicateDates([Num1] [tinyint],[Num2] [tinyint],[dte] [datetime])
  3.  
  4. delete from tblDulicateDates
  5. insert into tblDulicateDates select 1,1,'2007-01-01'
  6. insert into tblDulicateDates select 1,1,'2007-01-02'
  7. insert into tblDulicateDates select 1,1,'2007-01-03'
  8. insert into tblDulicateDates select 1,2,'2007-01-01'
  9. insert into tblDulicateDates select 1,2,'2007-01-02'
  10. insert into tblDulicateDates select 1,2,'2007-01-03'
  11. insert into tblDulicateDates select 1,3,'2007-01-01'
  12.  
  13.  
  14.  
  15. --show the table contents with the duplicate records except for date
  16. select * from tblDulicateDates
  17.  
  18. --Declare the necessary variables
  19. Declare @ThereAreDuplicates int,@Num1 int,@Num2 int, @Dte datetime
  20.  
  21.  
  22. --see if there are any duplicate records
  23. set @ThereAreDuplicates=(select count(a.num1) from
  24.     (select num1,num2,min(Dte) as Dte from tblDulicateDates group by num1,num2)a
  25.     join
  26.     (select num1,num2,max(Dte) as Dte from tblDulicateDates group by num1,num2)b on a.num1=b.num1 and a.num2=b.num2
  27.     where a.dte<>b.dte)
  28.  
  29.  
  30. --if there are duplicates then enter the loop
  31. while @ThereAreDuplicates > 0
  32. BEGIN
  33.     --select the duplicates that need to be deleted into a cursor
  34.     DECLARE DuplicatesCursor CURSOR FOR
  35.     select a.num1,a.num2,a.dte from
  36.     (select num1,num2,min(Dte) as Dte from tblDulicateDates group by num1,num2)a
  37.     join
  38.     (select num1,num2,max(Dte) as Dte from tblDulicateDates group by num1,num2)b on a.num1=b.num1 and a.num2=b.num2
  39.     where a.dte<>b.dte
  40.  
  41.  
  42.     OPEN DuplicatesCursor
  43.     FETCH NEXT FROM DuplicatesCursor
  44.     INTO @Num1,@Num2,@Dte
  45.  
  46.  
  47.     --enter a loop that deletes each of the records in the cursor
  48.     WHILE @@FETCH_STATUS = 0
  49.     BEGIN
  50.         DELETE FROM tblDulicateDates where Num1=@Num1 and Num2=@Num2 and Dte=@Dte
  51.  
  52.         FETCH NEXT FROM DuplicatesCursor
  53.         INTO @Num1,@Num2,@Dte
  54.     END
  55.     CLOSE DuplicatesCursor
  56.     DEALLOCATE DuplicatesCursor
  57.  
  58.     --Check to see if there are any more duplicates still in the table
  59.     --This is to handle the case where there are 3 or more duplicate records
  60.     set @ThereAreDuplicates=(select count(a.num1) from
  61.     (select num1,num2,min(Dte) as Dte from tblDulicateDates group by num1,num2)a
  62.     join
  63.     (select num1,num2,max(Dte) as Dte from tblDulicateDates group by num1,num2)b on a.num1=b.num1 and a.num2=b.num2
  64.     where a.dte<>b.dte)
  65.  
  66. END
  67.  
  68. --now show the table contents
  69. -- no duplicates and only the ones that had the max date are left
  70. select * from tblDulicateDates
  71.  
regards
Feb 14 '08 #2

P: 6
Thank you. I will give this a try!

-Jeanne

---------------------------------------------------------------------------

Paste this code into query analyser and see if it does what you need

Expand|Select|Wrap|Line Numbers
  1. --Setup a table and add some data for rhe example to work with
  2. create table tblDulicateDates([Num1] [tinyint],[Num2] [tinyint],[dte] [datetime])
  3.  
  4. delete from tblDulicateDates
  5. insert into tblDulicateDates select 1,1,'2007-01-01'
  6. insert into tblDulicateDates select 1,1,'2007-01-02'
  7. insert into tblDulicateDates select 1,1,'2007-01-03'
  8. insert into tblDulicateDates select 1,2,'2007-01-01'
  9. insert into tblDulicateDates select 1,2,'2007-01-02'
  10. insert into tblDulicateDates select 1,2,'2007-01-03'
  11. insert into tblDulicateDates select 1,3,'2007-01-01'
  12.  
  13.  
  14.  
  15. --show the table contents with the duplicate records except for date
  16. select * from tblDulicateDates
  17.  
  18. --Declare the necessary variables
  19. Declare @ThereAreDuplicates int,@Num1 int,@Num2 int, @Dte datetime
  20.  
  21.  
  22. --see if there are any duplicate records
  23. set @ThereAreDuplicates=(select count(a.num1) from
  24.     (select num1,num2,min(Dte) as Dte from tblDulicateDates group by num1,num2)a
  25.     join
  26.     (select num1,num2,max(Dte) as Dte from tblDulicateDates group by num1,num2)b on a.num1=b.num1 and a.num2=b.num2
  27.     where a.dte<>b.dte)
  28.  
  29.  
  30. --if there are duplicates then enter the loop
  31. while @ThereAreDuplicates > 0
  32. BEGIN
  33.     --select the duplicates that need to be deleted into a cursor
  34.     DECLARE DuplicatesCursor CURSOR FOR
  35.     select a.num1,a.num2,a.dte from
  36.     (select num1,num2,min(Dte) as Dte from tblDulicateDates group by num1,num2)a
  37.     join
  38.     (select num1,num2,max(Dte) as Dte from tblDulicateDates group by num1,num2)b on a.num1=b.num1 and a.num2=b.num2
  39.     where a.dte<>b.dte
  40.  
  41.  
  42.     OPEN DuplicatesCursor
  43.     FETCH NEXT FROM DuplicatesCursor
  44.     INTO @Num1,@Num2,@Dte
  45.  
  46.  
  47.     --enter a loop that deletes each of the records in the cursor
  48.     WHILE @@FETCH_STATUS = 0
  49.     BEGIN
  50.         DELETE FROM tblDulicateDates where Num1=@Num1 and Num2=@Num2 and Dte=@Dte
  51.  
  52.         FETCH NEXT FROM DuplicatesCursor
  53.         INTO @Num1,@Num2,@Dte
  54.     END
  55.     CLOSE DuplicatesCursor
  56.     DEALLOCATE DuplicatesCursor
  57.  
  58.     --Check to see if there are any more duplicates still in the table
  59.     --This is to handle the case where there are 3 or more duplicate records
  60.     set @ThereAreDuplicates=(select count(a.num1) from
  61.     (select num1,num2,min(Dte) as Dte from tblDulicateDates group by num1,num2)a
  62.     join
  63.     (select num1,num2,max(Dte) as Dte from tblDulicateDates group by num1,num2)b on a.num1=b.num1 and a.num2=b.num2
  64.     where a.dte<>b.dte)
  65.  
  66. END
  67.  
  68. --now show the table contents
  69. -- no duplicates and only the ones that had the max date are left
  70. select * from tblDulicateDates
  71.  
regards
Feb 15 '08 #3

P: 6
The sample code worked well. It didn't quite do what I need, though. I tried to fit it into my situation, but had no luck.

I have been given a table (TABLEDUPS) of known duplicate records. I need to delete records from a different table (TABLEEVENTS) based on this duplicate records table.

So what I really need to do is look at TABLEDUPS to determine the duplicate records to keep (most recent) and delete records in TABLEEVENTS that match the remaining record(s) in TABLEDUPS.

Any ideas...? Thanks.



I have a table with what I consider duplicate records. Data in all columns are duplicate except for the date column, meaning that duplicate data was entered on different dates and those dates were stored along with the data.

I want to delete the records except for most recent one. I can select the client ID field and the max(date field) to determine the ones I want to keep, but how do I determine the ones to delete. There are often more than two duplicates, so the min(date field) doesn't do it.

Any suggestions or guidance will be appreciated!

I use:
Microsoft SQL Enterprise Manager 8.0 running on Windows XP SP2
Feb 19 '08 #4

ck9663
Expert 2.5K+
P: 2,878
"delete records in TABLEEVENTS that match the remaining record(s) in TABLEDUPS."

Could you define "match" ? What columns will you use to determine if if matches? You can give the structure of those two tables.

-- CK
Feb 19 '08 #5

P: 6
They need to match on key fields. The structure for these key fields is the same for both tables:
Region varchar 2
ID varchar 9
Claim varchar 6
SVCDate datetime 8
Modifier varchar 2
Provider varchar 15
Profess varchar 15
Place varchar 2
ReportDate datetime 8

Once the match is made, I need to look at ReportDate in TABLEDUPS, then delete from TABLEEVENTS all records except the one with the most recent date.

Thanks.

"delete records in TABLEEVENTS that match the remaining record(s) in TABLEDUPS."

Could you define "match" ? What columns will you use to determine if if matches? You can give the structure of those two tables.

-- CK
Feb 21 '08 #6

ck9663
Expert 2.5K+
P: 2,878
Run this first:
Expand|Select|Wrap|Line Numbers
  1. select tableevents.* 
  2. from tableevents
  3. left join
  4. (select Region, ID, Claim, SVCDate, Modifier, Provider, Profess, Place, max(ReportDate) as latest from TABLEDUPS
  5. group by Region, ID, Claim, SVCDate, Modifier, Provider, Profess, Place) as dups
  6. on dups.Region = tableevents.Region and  dups.ID = tableevents.ID 
  7.  and dups.Claim = tableevents.Claim and dups.SVCDate = tableevents.SVCDate 
  8.  and dups.Modifier = tableevents.Modifier and dups.Provider = tableevents.Provider 
  9.  and dups.Profess = tableevents.Profess and dups.Place = tableevents.Place 
  10.  and dups.latest > tableevents.ReportDate
  11.  
If it returns the record you want you can delete it. Just make it a DELETE query. The reason I'm asking you run this first so that you don't delete the rows you don't want. By running a SELECT first, you can take a look first if your deleting the right rows. But it's always good to have a backup.

Happy coding

-- CK
Feb 21 '08 #7

P: 6
Thanks. I'll give this a try.

Run this first:
Expand|Select|Wrap|Line Numbers
  1. select tableevents.* 
  2. from tableevents
  3. left join
  4. (select Region, ID, Claim, SVCDate, Modifier, Provider, Profess, Place, max(ReportDate) as latest from TABLEDUPS
  5. group by Region, ID, Claim, SVCDate, Modifier, Provider, Profess, Place) as dups
  6. on dups.Region = tableevents.Region and  dups.ID = tableevents.ID 
  7.  and dups.Claim = tableevents.Claim and dups.SVCDate = tableevents.SVCDate 
  8.  and dups.Modifier = tableevents.Modifier and dups.Provider = tableevents.Provider 
  9.  and dups.Profess = tableevents.Profess and dups.Place = tableevents.Place 
  10.  and dups.latest > tableevents.ReportDate
  11.  
If it returns the record you want you can delete it. Just make it a DELETE query. The reason I'm asking you run this first so that you don't delete the rows you don't want. By running a SELECT first, you can take a look first if your deleting the right rows. But it's always good to have a backup.

Happy coding

-- CK
Feb 22 '08 #8

Post your reply

Sign in to post your reply or Sign up for a free account.