473,325 Members | 2,342 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,325 software developers and data experts.

delete otherwise duplicate records based on differing values in one column

I have a table with what I consider duplicate records. Data in all columns are duplicate except for the date column, meaning that duplicate data was entered on different dates and those dates were stored along with the data.

I want to delete the records except for most recent one. I can select the client ID field and the max(date field) to determine the ones I want to keep, but how do I determine the ones to delete. There are often more than two duplicates, so the min(date field) doesn't do it.

Any suggestions or guidance will be appreciated!

I use:
Microsoft SQL Enterprise Manager 8.0 running on Windows XP SP2
Feb 14 '08 #1
7 3883
Delerna
1,134 Expert 1GB
Paste this code into query analyser and see if it does what you need

Expand|Select|Wrap|Line Numbers
  1. --Setup a table and add some data for rhe example to work with
  2. create table tblDulicateDates([Num1] [tinyint],[Num2] [tinyint],[dte] [datetime])
  3.  
  4. delete from tblDulicateDates
  5. insert into tblDulicateDates select 1,1,'2007-01-01'
  6. insert into tblDulicateDates select 1,1,'2007-01-02'
  7. insert into tblDulicateDates select 1,1,'2007-01-03'
  8. insert into tblDulicateDates select 1,2,'2007-01-01'
  9. insert into tblDulicateDates select 1,2,'2007-01-02'
  10. insert into tblDulicateDates select 1,2,'2007-01-03'
  11. insert into tblDulicateDates select 1,3,'2007-01-01'
  12.  
  13.  
  14.  
  15. --show the table contents with the duplicate records except for date
  16. select * from tblDulicateDates
  17.  
  18. --Declare the necessary variables
  19. Declare @ThereAreDuplicates int,@Num1 int,@Num2 int, @Dte datetime
  20.  
  21.  
  22. --see if there are any duplicate records
  23. set @ThereAreDuplicates=(select count(a.num1) from
  24.     (select num1,num2,min(Dte) as Dte from tblDulicateDates group by num1,num2)a
  25.     join
  26.     (select num1,num2,max(Dte) as Dte from tblDulicateDates group by num1,num2)b on a.num1=b.num1 and a.num2=b.num2
  27.     where a.dte<>b.dte)
  28.  
  29.  
  30. --if there are duplicates then enter the loop
  31. while @ThereAreDuplicates > 0
  32. BEGIN
  33.     --select the duplicates that need to be deleted into a cursor
  34.     DECLARE DuplicatesCursor CURSOR FOR
  35.     select a.num1,a.num2,a.dte from
  36.     (select num1,num2,min(Dte) as Dte from tblDulicateDates group by num1,num2)a
  37.     join
  38.     (select num1,num2,max(Dte) as Dte from tblDulicateDates group by num1,num2)b on a.num1=b.num1 and a.num2=b.num2
  39.     where a.dte<>b.dte
  40.  
  41.  
  42.     OPEN DuplicatesCursor
  43.     FETCH NEXT FROM DuplicatesCursor
  44.     INTO @Num1,@Num2,@Dte
  45.  
  46.  
  47.     --enter a loop that deletes each of the records in the cursor
  48.     WHILE @@FETCH_STATUS = 0
  49.     BEGIN
  50.         DELETE FROM tblDulicateDates where Num1=@Num1 and Num2=@Num2 and Dte=@Dte
  51.  
  52.         FETCH NEXT FROM DuplicatesCursor
  53.         INTO @Num1,@Num2,@Dte
  54.     END
  55.     CLOSE DuplicatesCursor
  56.     DEALLOCATE DuplicatesCursor
  57.  
  58.     --Check to see if there are any more duplicates still in the table
  59.     --This is to handle the case where there are 3 or more duplicate records
  60.     set @ThereAreDuplicates=(select count(a.num1) from
  61.     (select num1,num2,min(Dte) as Dte from tblDulicateDates group by num1,num2)a
  62.     join
  63.     (select num1,num2,max(Dte) as Dte from tblDulicateDates group by num1,num2)b on a.num1=b.num1 and a.num2=b.num2
  64.     where a.dte<>b.dte)
  65.  
  66. END
  67.  
  68. --now show the table contents
  69. -- no duplicates and only the ones that had the max date are left
  70. select * from tblDulicateDates
  71.  
regards
Feb 14 '08 #2
Thank you. I will give this a try!

-Jeanne

---------------------------------------------------------------------------

Paste this code into query analyser and see if it does what you need

Expand|Select|Wrap|Line Numbers
  1. --Setup a table and add some data for rhe example to work with
  2. create table tblDulicateDates([Num1] [tinyint],[Num2] [tinyint],[dte] [datetime])
  3.  
  4. delete from tblDulicateDates
  5. insert into tblDulicateDates select 1,1,'2007-01-01'
  6. insert into tblDulicateDates select 1,1,'2007-01-02'
  7. insert into tblDulicateDates select 1,1,'2007-01-03'
  8. insert into tblDulicateDates select 1,2,'2007-01-01'
  9. insert into tblDulicateDates select 1,2,'2007-01-02'
  10. insert into tblDulicateDates select 1,2,'2007-01-03'
  11. insert into tblDulicateDates select 1,3,'2007-01-01'
  12.  
  13.  
  14.  
  15. --show the table contents with the duplicate records except for date
  16. select * from tblDulicateDates
  17.  
  18. --Declare the necessary variables
  19. Declare @ThereAreDuplicates int,@Num1 int,@Num2 int, @Dte datetime
  20.  
  21.  
  22. --see if there are any duplicate records
  23. set @ThereAreDuplicates=(select count(a.num1) from
  24.     (select num1,num2,min(Dte) as Dte from tblDulicateDates group by num1,num2)a
  25.     join
  26.     (select num1,num2,max(Dte) as Dte from tblDulicateDates group by num1,num2)b on a.num1=b.num1 and a.num2=b.num2
  27.     where a.dte<>b.dte)
  28.  
  29.  
  30. --if there are duplicates then enter the loop
  31. while @ThereAreDuplicates > 0
  32. BEGIN
  33.     --select the duplicates that need to be deleted into a cursor
  34.     DECLARE DuplicatesCursor CURSOR FOR
  35.     select a.num1,a.num2,a.dte from
  36.     (select num1,num2,min(Dte) as Dte from tblDulicateDates group by num1,num2)a
  37.     join
  38.     (select num1,num2,max(Dte) as Dte from tblDulicateDates group by num1,num2)b on a.num1=b.num1 and a.num2=b.num2
  39.     where a.dte<>b.dte
  40.  
  41.  
  42.     OPEN DuplicatesCursor
  43.     FETCH NEXT FROM DuplicatesCursor
  44.     INTO @Num1,@Num2,@Dte
  45.  
  46.  
  47.     --enter a loop that deletes each of the records in the cursor
  48.     WHILE @@FETCH_STATUS = 0
  49.     BEGIN
  50.         DELETE FROM tblDulicateDates where Num1=@Num1 and Num2=@Num2 and Dte=@Dte
  51.  
  52.         FETCH NEXT FROM DuplicatesCursor
  53.         INTO @Num1,@Num2,@Dte
  54.     END
  55.     CLOSE DuplicatesCursor
  56.     DEALLOCATE DuplicatesCursor
  57.  
  58.     --Check to see if there are any more duplicates still in the table
  59.     --This is to handle the case where there are 3 or more duplicate records
  60.     set @ThereAreDuplicates=(select count(a.num1) from
  61.     (select num1,num2,min(Dte) as Dte from tblDulicateDates group by num1,num2)a
  62.     join
  63.     (select num1,num2,max(Dte) as Dte from tblDulicateDates group by num1,num2)b on a.num1=b.num1 and a.num2=b.num2
  64.     where a.dte<>b.dte)
  65.  
  66. END
  67.  
  68. --now show the table contents
  69. -- no duplicates and only the ones that had the max date are left
  70. select * from tblDulicateDates
  71.  
regards
Feb 15 '08 #3
The sample code worked well. It didn't quite do what I need, though. I tried to fit it into my situation, but had no luck.

I have been given a table (TABLEDUPS) of known duplicate records. I need to delete records from a different table (TABLEEVENTS) based on this duplicate records table.

So what I really need to do is look at TABLEDUPS to determine the duplicate records to keep (most recent) and delete records in TABLEEVENTS that match the remaining record(s) in TABLEDUPS.

Any ideas...? Thanks.



I have a table with what I consider duplicate records. Data in all columns are duplicate except for the date column, meaning that duplicate data was entered on different dates and those dates were stored along with the data.

I want to delete the records except for most recent one. I can select the client ID field and the max(date field) to determine the ones I want to keep, but how do I determine the ones to delete. There are often more than two duplicates, so the min(date field) doesn't do it.

Any suggestions or guidance will be appreciated!

I use:
Microsoft SQL Enterprise Manager 8.0 running on Windows XP SP2
Feb 19 '08 #4
ck9663
2,878 Expert 2GB
"delete records in TABLEEVENTS that match the remaining record(s) in TABLEDUPS."

Could you define "match" ? What columns will you use to determine if if matches? You can give the structure of those two tables.

-- CK
Feb 19 '08 #5
They need to match on key fields. The structure for these key fields is the same for both tables:
Region varchar 2
ID varchar 9
Claim varchar 6
SVCDate datetime 8
Modifier varchar 2
Provider varchar 15
Profess varchar 15
Place varchar 2
ReportDate datetime 8

Once the match is made, I need to look at ReportDate in TABLEDUPS, then delete from TABLEEVENTS all records except the one with the most recent date.

Thanks.

"delete records in TABLEEVENTS that match the remaining record(s) in TABLEDUPS."

Could you define "match" ? What columns will you use to determine if if matches? You can give the structure of those two tables.

-- CK
Feb 21 '08 #6
ck9663
2,878 Expert 2GB
Run this first:
Expand|Select|Wrap|Line Numbers
  1. select tableevents.* 
  2. from tableevents
  3. left join
  4. (select Region, ID, Claim, SVCDate, Modifier, Provider, Profess, Place, max(ReportDate) as latest from TABLEDUPS
  5. group by Region, ID, Claim, SVCDate, Modifier, Provider, Profess, Place) as dups
  6. on dups.Region = tableevents.Region and  dups.ID = tableevents.ID 
  7.  and dups.Claim = tableevents.Claim and dups.SVCDate = tableevents.SVCDate 
  8.  and dups.Modifier = tableevents.Modifier and dups.Provider = tableevents.Provider 
  9.  and dups.Profess = tableevents.Profess and dups.Place = tableevents.Place 
  10.  and dups.latest > tableevents.ReportDate
  11.  
If it returns the record you want you can delete it. Just make it a DELETE query. The reason I'm asking you run this first so that you don't delete the rows you don't want. By running a SELECT first, you can take a look first if your deleting the right rows. But it's always good to have a backup.

Happy coding

-- CK
Feb 21 '08 #7
Thanks. I'll give this a try.

Run this first:
Expand|Select|Wrap|Line Numbers
  1. select tableevents.* 
  2. from tableevents
  3. left join
  4. (select Region, ID, Claim, SVCDate, Modifier, Provider, Profess, Place, max(ReportDate) as latest from TABLEDUPS
  5. group by Region, ID, Claim, SVCDate, Modifier, Provider, Profess, Place) as dups
  6. on dups.Region = tableevents.Region and  dups.ID = tableevents.ID 
  7.  and dups.Claim = tableevents.Claim and dups.SVCDate = tableevents.SVCDate 
  8.  and dups.Modifier = tableevents.Modifier and dups.Provider = tableevents.Provider 
  9.  and dups.Profess = tableevents.Profess and dups.Place = tableevents.Place 
  10.  and dups.latest > tableevents.ReportDate
  11.  
If it returns the record you want you can delete it. Just make it a DELETE query. The reason I'm asking you run this first so that you don't delete the rows you don't want. By running a SELECT first, you can take a look first if your deleting the right rows. But it's always good to have a backup.

Happy coding

-- CK
Feb 22 '08 #8

Sign in to post your reply or Sign up for a free account.

Similar topics

1
by: Patrizio | last post by:
Hi All, I've the following table with a PK defined on an IDENTITY column (INSERT_SEQ): CREATE TABLE MYDATA ( MID NUMERIC(19,0) NOT NULL, MYVALUE FLOAT NOT NULL, TIMEKEY ...
2
by: Barbara | last post by:
Hi, I have an sql database that has the primary key set to three fields, but has not been set as unique(I didn't create the table). I have 1 record that has 2 duplicates and I am unable to delete...
16
by: Philip Boonzaaier | last post by:
I want to be able to generate SQL statements that will go through a list of data, effectively row by row, enquire on the database if this exists in the selected table- If it exists, then the colums...
2
by: ms | last post by:
Access 2000: I am trying to delete duplicate records imported to a staging table leaving one of the duplicates to be imported into the live table. A unique record is based on a composite key of 3...
3
by: vcornjamb | last post by:
Hello, I am developing a web form that contains some buttons and a data grid which has as its last column link buttons that will delete the data associated with that row. Everything works fine,...
2
by: rich | last post by:
I am building a database and I am using a list where I can make multiple choices. The data is like this Master table item1id item2 index(item1id) detail table item1id
6
by: polocar | last post by:
Hi, I'm writing a program in Visual C# 2005 Professional Edition. This program connects to a SQL Server 2005 database called "Generations" (in which there is only one table, called...
4
by: ramdil | last post by:
Hi All I have table and it have around 90000 records.Its primary key is autonumber field and it has also have date column and name, then some other columns Now i have problem with the table,as my...
1
watertraveller
by: watertraveller | last post by:
Hi all. My ultimate goal is to return two columns, where no single value appears anywhere twice. This means that not only do I want to check that nothing from column A appears in column B and...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
1
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
0
by: af34tf | last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.