By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
424,491 Members | 858 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 424,491 IT Pros & Developers. It's quick & easy.

Removing Duplicates based on most recent

bard777
P: 23
I fully expect to have a duh moment when I read replies to this....BUT....

I have a table with duplicates in the [LN] field (could be 2 records with same value, might be 7 records). I need to delete all but the the most recent based on the [eDate] field.

I hope this is enough info.

Thanks.
May 12 '10 #1

✓ answered by NeoPa

This should handle the nesting of the subqueries and the duplications when [eDate]s match :
Expand|Select|Wrap|Line Numbers
  1. DELETE tDo.*
  2. FROM   [tblDuplicates] AS tDo
  3. WHERE  tDo.ID Not In(
  4.     SELECT   TOP 1
  5.              tDi.ID
  6.     FROM     [tblDuplicates] AS tDi
  7.     WHERE    tDi.LN=tDo.LN
  8.     ORDER BY tDi.[eDate] DESC
  9.            , tDi.ID DESC
  10.     )

Share this Question
Share on Google+
12 Replies


100+
P: 122
@bard777
Do you have a unique key set up in your table?

If so, you could do something like this:
Expand|Select|Wrap|Line Numbers
  1. DELETE Table.* FROM Table
  2. WHERE [ID] NOT IN (SELECT Last(ID) FROM
  3. (SELECT ID, LN FROM Table ORDER BY eDate)
  4. GROUP BY LN);
That's a lot more sub-queries then I like to put into a criteria expression, but it gives you the idea. Also - do you ever have two records with the same LN and the same eDate? Would you want to keep them all, or does it matter which is deleted?

Things are a little trickier if there's no ID field. Another thing to consider is you could select the data you want with a GROUP BY clause and make a new table from it. I wouldn't recommend it if this is something you will be repeating, but it might be an option depending on your setup.
May 12 '10 #2

bard777
P: 23
@gershwyn
Yes, I have an ID field, so this is a good lead, thanks.

It is possible that [eDate] could be the same as well, but in this case either will do. The real issue is there is no data validation on the front end.
May 12 '10 #3

Jim Doherty
Expert 100+
P: 897
@bard777
The following function will open a recordset examining two columns LN and edate in a table named Table1 and stack records in Ascending order of LN and descending descending order of edate. The procedure loops records skipping over the first LN number (ie the latest date for that LN value) and deletes any other records having the same LN value in the stack. It repeats this process for each individual LN number found until the end of the table. The end result is an individual LN value in the first column and in the second column the latest date for that LN number. Obviously test this out on a copy table to see if it fits your needs

Expand|Select|Wrap|Line Numbers
  1. Function Get_Rid_Of_LNRecords_Except_Latest_EDate()
  2. On Error GoTo Err_Get_Rid_Of_LNRecords_Except_Latest_EDate
  3. Dim db As DAO.Database
  4. Dim rst As DAO.Recordset
  5. Set db = CurrentDb
  6. mysql = "SELECT Table1.LN, Table1.eDate "
  7. mysql = mysql & "FROM Table1 "
  8. mysql = mysql & "ORDER BY Table1.LN, Table1.eDate DESC;"
  9.  
  10. ' open a recordset of two columns
  11. ' sorting records by LN first then the edate descending
  12. Set rst = db.OpenRecordset(mysql, dbOpenDynaset)
  13. ' if a dataset is available loop through records
  14. ' skipping the first LN record encountered and deleting the rest
  15. ' the first LN record encountered will be first LN record for any given date
  16.  
  17. mypointer = rst!LN
  18.  
  19.         Do While Not rst.BOF And Not rst.EOF
  20.             'skip the first one
  21.             rst.MoveNext
  22.                     If mypointer = rst!LN Then
  23.                        rst.Delete
  24.                     Else
  25.                     'ressign the pointer because we have encountered
  26.                     ' a new LN
  27.                         mypointer = rst!LN
  28.                     End If
  29.          Loop
  30.  
  31. rst.Close
  32. mysql = ""
  33. Set rst = Nothing
  34. Set db = Nothing
  35. MsgBox "Process Complete", vbInformation, "System Message"
  36. Exit_Get_Rid_Of_LNRecords_Except_Latest_EDate:
  37.     Exit Function
  38.  
  39. Err_Get_Rid_Of_LNRecords_Except_Latest_EDate:
  40.     If Err = 3021 Then
  41.         'just ignore because encountered last row
  42.     MsgBox "Process Complete", vbInformation, "System Message"
  43.     Else
  44.         MsgBox "Error " & Err.Number & " " & Err.Description
  45.     End If
  46.     Resume Exit_Get_Rid_Of_LNRecords_Except_Latest_EDate
  47.  
  48. End Function
You can run the above if you paste it as is into a new module and then place a command button on a form (without using the wizard) then in the property box for the command button at the 'On click' event just type this =Get_Rid_Of_LNRecords_Except_Latest_EDate()
May 12 '10 #4

bard777
P: 23
@Jim Doherty
Thanks Jim...I had already started donig something similar so I went with that (for now anyways).

Here is what I have, but it is not working:

Expand|Select|Wrap|Line Numbers
  1. Set db = CurrentDb
  2. Dim rsDEDUPE As Recordset
  3. Dim varLN As String
  4. Dim varTic As Long
  5. Dim varToc As Long
  6.  
  7. varTic = 0 ' count total deleted duplicate records
  8. varToc = 0 ' count total for duplicate recordset
  9.  
  10. Set rsDEDUPE = db.OpenRecordset("_1duplicates for tbl_exceptions_CSV") ' rename query
  11. rsDEDUPE.MoveFirst
  12.  
  13. Do While Not rsDEDUPE.EOF
  14.  
  15.     If Not rsDEDUPE.EOF Then
  16.         varLN = rsDEDUPE(0)
  17.         rsDEDUPE.MoveNext
  18.         varToc = varToc + 1
  19.     End If
  20.  
  21. If Not rsDEDUPE.EOF Then
  22.    If Trim(varLN) = Trim(rsDEDUPE(0)) Then
  23.  
  24.         DoCmd.RunSQL ("DELETE * FROM tbl_exceptions_CSV WHERE [AutoKey] = " & rsDEDUPE(2))
  25.  
  26.         varTic = varTic + 1
  27.     End If
  28. End If
  29.  
  30. Loop
It is giving me a "Record Deleted" error on the live that is:

Expand|Select|Wrap|Line Numbers
  1. varLN = rsDEDUPE(0)
I have gotten this error in the past when I have a query that has a linked table that is linked to a linked table. The table that the query uses to create the recordset is local, so I don't know why this would be happening.
May 13 '10 #5

Jim Doherty
Expert 100+
P: 897
Have you looked at the logic of the one I sent you. I did take the time out to present you with something that aligns itself to your original post!

I am not so sure I understand what you are experiencing with the #deletion# thing. If you have an openrecordset and delete records in SQL (Delete From X Where etc etc) at the same time then obviously the recordset will reflect that
May 13 '10 #6

NeoPa
Expert Mod 15k+
P: 31,186
This should handle the nesting of the subqueries and the duplications when [eDate]s match :
Expand|Select|Wrap|Line Numbers
  1. DELETE tDo.*
  2. FROM   [tblDuplicates] AS tDo
  3. WHERE  tDo.ID Not In(
  4.     SELECT   TOP 1
  5.              tDi.ID
  6.     FROM     [tblDuplicates] AS tDi
  7.     WHERE    tDi.LN=tDo.LN
  8.     ORDER BY tDi.[eDate] DESC
  9.            , tDi.ID DESC
  10.     )
May 14 '10 #7

Jim Doherty
Expert 100+
P: 897
@NeoPa
Just a small point Neopa' the correlated inner loop needs descending order edate sorting otherwise the poster will be left with the 'earliest date' as opposed to the latest date having the benefit of two insights SQL and VBA. What more can one need?
May 14 '10 #8

NeoPa
Expert Mod 15k+
P: 31,186
You're absolutely right of course Jim :)

Strangely I've done a couple of similar ones today and thought I'd included that in this one, as I did in the other. Thanks for catching.
May 14 '10 #9

bard777
P: 23
Thanks NeoPa, that did the trick! I thought it could all be done in a query, I just couldn't wrap my mind around the logic. I also found out something interesting about using TOP 1....it will return multiple records if they meet the criteria (I didn't include the ID field in the ORDER first time around), didn't know it would do that.

Jim, thanks for the time you spent on your method. I had already started working on some VBA along the same lines before I saw your code. I didn't ignore your effort, I was trying to work with what I had instead of re-writing at that point. I did mean to seem ungratelful for your help.
May 18 '10 #10

NeoPa
Expert Mod 15k+
P: 31,186
No worries Bard. The TOP predicate catches many of us out believe me ;)
May 19 '10 #11

Jim Doherty
Expert 100+
P: 897
@bard777
Don't worry about it Bard I am not the 'sensitive soul' :) I much rather prefer that multiple methods came piling into threads the browsing individual not just yourself then has choice according to their particular flavour of working.

SQL,s TOP predicate will return more than 'one' record as you experienced because it returns TIES by default in Access (whereas in SQL Server it does not you explicitly request it) You simply code that out by using the DISTINCT predicate
May 19 '10 #12

NeoPa
Expert Mod 15k+
P: 31,186
Interesting Jim (DISTINCT predicate of the SELECT clause for handling ties). I normally include a unique reference field in the sorting to avoid them (as in current thread).
May 19 '10 #13

Post your reply

Sign in to post your reply or Sign up for a free account.