By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
434,903 Members | 2,097 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 434,903 IT Pros & Developers. It's quick & easy.

Delete duplicate records daily based on Date

Brilstern
100+
P: 207
I upload names into a database everyday and sometimes I am given duplicate names or names that have previously been submitted. Basically I want to delete all but the latest record based off of the date. I have seen many post about this but I haven't really been able to get anything to work. On my table [Inbound] I have fields [NAME], [SSN], [RANK], [ULN], and [DATE]. I want to delete all duplicate records that have the same SSN excluding Null values. The catch is I want to delete records with the older date. I understand the Min/Max setting on a delete query but I am having trouble getting it to work properly. I figure I am way off so I am starting over. Below is a page that I read with some details regarding the issue but I don't understand access enough to get what I need out of it.

http://bytes.com/topic/access/answer...licate-records

NeoPa,

I think your post might have the closest answer but I am not sure where to go with it.

Sgt B
Dec 28 '11 #1

✓ answered by NeoPa

Lines #4 & #5 of the first batch of SQL from post #17 should have been omitted. Including those lines was an error.

I'm a bit flummoxed on the third one (from post #18) as the table to be deleted from is specified explicitly on line #1. The Help section for the DELETE clause in Access is not very helpful and quite ambiguous, nevertheless I believe I've struggled with this before and found this format both to be consistent with what is there, as well as to work. That said, I knocked up an [Inbound] table with some indicative test data and was unable to get the query to execute :-( It basically wouldn't work, even though it conformed to the syntax indicated in the Help section.

This leaves my best suggestion to date as :
Expand|Select|Wrap|Line Numbers
  1. DELETE *
  2. FROM   [Inbound]
  3. WHERE  ([ID] IN(SELECT DISTINCT tI1.ID
  4.                 FROM   [Inbound] AS [tI1]
  5.                        INNER JOIN
  6.                        [Inbound] AS [tI2]
  7.                   ON   (tI1.SSN = tI2.SSN)
  8.                 WHERE  (tI1.SSN IS NOT NULL)
  9.                   AND  (tI1.Date < DateAdd('d', -10, tI2.Date))))
A couple of small changes beyond simply removing lines #4 & #5, but basically almost identical to the first suggestion of post #17. It worked for me logically (It executed correctly and removed the correct records), but I have no information on its relative performance.

PS. If you find the performance unacceptable then try removing the predicate DISTINCT from line #3 and see how that goes. In my tests both versions worked, but my data wasn't verbose enough to indicate which would handle better under load.

Share this Question
Share on Google+
32 Replies


Expert Mod 2.5K+
P: 2,545
The example below uses two subqueries, one to match the SSN and the other to exclude rows containing the maximum date for that SSN. Please replace the table name 'tblTest' in the example with the real name of the table concerned.

Expand|Select|Wrap|Line Numbers
  1. DELETE 
  2.   *
  3. FROM  
  4.   tblTest AS X
  5. WHERE 
  6.  (X.SSN In 
  7.     (Select 
  8.        SSN 
  9.      FROM 
  10.        tblTest as A 
  11.      WHERE 
  12.        A.SSN=X.SSN
  13.      )
  14.  ) 
  15. AND 
  16.  (X.Date Not In 
  17.     (Select 
  18.        Max([Date]) as MaxDate 
  19.      FROM 
  20.        tblTest as A 
  21.      WHERE 
  22.        A.SSN=X.SSN
  23.      )
  24.  );
Test data:

Expand|Select|Wrap|Line Numbers
  1. NAME     SSN    RANK  ULN  DATE
  2. J Bloggs 10101  Sgt   NK   28/12/2011
  3. J Bloggs 10101  Sgt   NK   05/12/2011
  4. J Bloggs 10101  Sgt   NK   03/11/2011
  5. J Bloggs 10101  Sgt   NK   01/11/2011
  6. M Mouse  20000  Lt    A    01/11/2011
  7. M Mouse  20000  Lt    A    02/11/2011
  8. M Mouse  20000  Lt    A    03/11/2011
  9. M Mouse  20000  Lt    A    04/11/2011
  10. M Mouse  20000  Lt    A    05/11/2011
  11. M Mouse  20000  Lt    A    06/11/2011
  12. S Clause 16001  Pt    B    28/12/2011
After deletion:

Expand|Select|Wrap|Line Numbers
  1. NAME     SSN    RANK  ULN  DATE
  2. J Bloggs 10101  Sgt   NK   28/12/2011
  3. M Mouse  20000  Lt    A    06/11/2011
  4. S Clause 16001  Pt    B    28/12/2011
-Stewart
Dec 28 '11 #2

Expert 100+
P: 446
Stewart,
A nice piece of code! I could never get this correlated sub-query stuff to work in Access, so well done!

Stevan,
Personally I hate deleting and would sooner not add the data to create a duplicate in the first place. I would sooner loop through the INCOMING and IF the SSN already existed in the main data then do an UPDATE command, else do the APPEND.

I'm also surmising that there may be subordinate (related) records associated with the table that holds [Name],[SSN],[Rank]... If you link via [SSN] then ok but a US SSN is nine digits and two dashes, i.e. an 11 character string, not what I would choose as a key field to link to other tables. If they were linked via a Long Integer (autonumber) field then the subordinated data would be orphaned when the earlier record was deleted.

But perhaps other people just have a simple life!
S7
Dec 28 '11 #3

NeoPa
Expert Mod 15k+
P: 31,492
Generally I would agree with S7 when they say :
"Personally I hate deleting and would sooner not add the data to create a duplicate in the first place. I would sooner loop through the INCOMING and IF the SSN already existed in the main data then do an UPDATE command, else do the APPEND."

In your situation though, where items to be removed from the overall data can come from both sources (previously good data as well as new incoming data), then I'd be inclined to do the import first and then tidy up in a single table.

The following SQL, which is on very similar lines to Stewart's mind you, should do the trick for you :

Expand|Select|Wrap|Line Numbers
  1. DELETE *
  2. FROM   [Inbound] AS tIo
  3. WHERE  ([SSN] Is Not Null)
  4.   AND  ([Date] < (SELECT Max([Date])
  5.                   FROM   [Inbound] AS tIi
  6.                   WHERE  (tIi.SSN = tIo.SSN)))
PS. Your situation is quite different from the one in the other thread. There the question asked specifically for a VBA solution. Yours doesn't require such a stipulation and the SQL approach would be far preferable (IMHO).
Dec 28 '11 #4

Brilstern
100+
P: 207
Thank you all, I personally prefer VBA as I understand it better, SQL just throws me for a loop for some reason. (I am currently reading Access for Dummies) I currently have the code below and realized I was deleting both the original record and the new reason so the last few days have been spent re-consolidating excel sheets to refresh my data. I would like to delete because I am linking this data to MS SharePoint and others use my daily updated roster to track their individual personnel so I need to be able to only have the most current data on the table as well as give them the means to research and sort through the data only on the table. I have multiple queries and other tools on the database but they can only see the two separate tables I update everyday. I only today have found out I need to tweak this a little more and I think I can use one of the above solutions but I haven't had the chance to apply any yet. Even though I want to Keep the newest record I also need to involve the ULN. Basically I need to delete records with both the same SSN and ULN.

Expand|Select|Wrap|Line Numbers
  1. Private Sub Command27_Click()
  2.  
  3.  DoCmd.SetWarnings False
  4.  
  5.     'Get confirmation from user
  6.       If vbYes = MsgBox("Are you sure you want to import today's JRC Manifest?", vbYesNo, "Importing") Then
  7.         'Check for file existance Inbound
  8.           If Len(Dir("\\lnknfs01\share\rcsw\c-1\0-II MEF (FWD) (1 MAR 2011)\0-Manpower Section\JRC Manifest\Inbound\In.xlsx")) = "0" Then
  9.              'File does not exist, import blank sheet
  10.               DoCmd.RunSavedImportExport "Import-Inblk"
  11.             Else
  12.               'File exists, import
  13.               DoCmd.RunSavedImportExport "Import-In"
  14.  
  15.           End If
  16.  
  17.  
  18.         'Check for file existance outbound
  19.           If Len(Dir("\\lnknfs01\share\rcsw\c-1\0-II MEF (FWD) (1 MAR 2011)\0-Manpower Section\JRC Manifest\Outbound\Out.xlsx")) = "0" Then
  20.               'File does not exist, import blank sheet
  21.               DoCmd.RunSavedImportExport "Import-Outblk"
  22.             Else
  23.               'File exists, import
  24.               DoCmd.RunSavedImportExport "Import-Out"
  25.            End If
  26.  
  27.        'Done importing, update screen
  28.          DoCmd.Requery ""
  29.       End If
  30.     'Add Date
  31.         DoCmd.OpenQuery "inbound query"
  32.         DoCmd.OpenQuery "outbound query"
  33.     'Date added update screen
  34.         DoCmd.Requery ""
  35.  
  36.     'Delete Duplicates
  37.         DoCmd.OpenQuery "Delete Dup SSN for Inbound", acViewNormal, acEdit
  38.         DoCmd.OpenQuery "Delete Dup SSN for Outbound", acViewNormal, acEdit
  39.     'Date added update screen
  40.         DoCmd.Requery ""
  41.  
  42.  DoCmd.SetWarnings True
  43.  
  44.     End Sub
The below snipit is what I need to edit because this is what is messing up my data. I am trying to sub in something similar to the query NeoPa has provided but with the slight change. Ill get back on after I experiment a little. I appreciate all the help and the feedback.


Expand|Select|Wrap|Line Numbers
  1. 'Delete Duplicates
  2.         DoCmd.OpenQuery "Delete Dup SSN for Inbound", acViewNormal, acEdit
  3.         DoCmd.OpenQuery "Delete Dup SSN for Outbound", acViewNormal, acEdit
Sierra7,

Not to worried about primary keys as SSN is not my primary key but it does have a auto ID via SharePoint. After you link a table to SharePoint you lose the choice and it creates an Auto ID.

Sgt B
Dec 30 '11 #5

Brilstern
100+
P: 207
Ok, after all said and done I am trying this (currently running). Hope it works. Thx NeoPa.

Expand|Select|Wrap|Line Numbers
  1.     DELETE *
  2.     FROM   [Inbound] AS tIo
  3.     WHERE  ([SSN] Is Not Null)
  4.       AND  ([Date] < (SELECT Max([Date])
  5.                       FROM   [Inbound] AS tIi
  6.                       WHERE  (tIi.SSN = tIo.SSN)
  7.                       AND    (tIi.ULN = tIo.ULN)))
Also thx to everybody else. all input is very helpful to a learning access dummy!

Sgt B

** And of course this is pertaining about 50000 records so it is taking forever so it may be tomorrow before I give any feedback, sorry guys **
Dec 30 '11 #6

Brilstern
100+
P: 207
Ok NeoPa, This worked great but runs very slow. Not really sure why but it takes forever!! But I appreciate it. I am trying to add in another stipulation but I'm not really sure how to word it, currently playing with it but i thought I might ask anyways in case I cant find it. Instead of the ULN criteria I want it to see if the dates are less then 10 days apart. Is so delete the older one, if not do nothing. something kinda like this but obviously in correct SQL format which I don't know how to do.

Expand|Select|Wrap|Line Numbers
  1.     DELETE *
  2.     FROM   [Inbound] AS tIo
  3.     WHERE  ([SSN] Is Not Null)
  4.       AND  ([Date] < (SELECT Max([Date]) {date - 10}??
  5.                       FROM   [Inbound] AS tIi
  6.                       WHERE  (tIi.SSN = tIo.SSN)
Dec 30 '11 #7

NeoPa
Expert Mod 15k+
P: 31,492
Some good thinking there Sgt. The reason this is taking so long is because the subquery is run once for every record it processes. That's 50,000+ queries trying to run. It'll take a while I would guess. A faster-running alternative would involve JOINing two recordsets together via the PK, and would require more knowledge of your table design than we currently have, as well as a fair bit of careful consideration.

For now, why not try out this amended query to handle your ten days grace (NB. For something like this it is critical that you test that it will do what you want prior to running it on the live copy of your data. What we offer are guesses from a distance without the benefit of a full understanding of your database.) :
Expand|Select|Wrap|Line Numbers
  1. DELETE *
  2. FROM   [Inbound] AS tIo
  3. WHERE  ([SSN] Is Not Null)
  4.   AND  ([Date] < DateAdd('d', -10,
  5.                          (SELECT Max([Date])
  6.                           FROM   [Inbound] AS tIi
  7.                           WHERE  (tIi.SSN = tIo.SSN)
  8.                           AND    (tIi.ULN = tIo.ULN))))
Dec 30 '11 #8

Expert 100+
P: 446
Hi
Quote "** And of course this is pertaining about 50000 records so it is taking forever so it may be tomorrow before I give any feedback, sorry guys **"

I think this emphasises the need to clean-out the existing data but make provision for future additions not to cause duplicates.

Ofcourse, if we were dealing with the British army we would have much smaller numbers !-)
S7
Dec 30 '11 #9

Brilstern
100+
P: 207
NeoPa,

The provided Query seams to be more what I had in mind. (noting that the ULN verification is no longer needed due to the 10 day window)Still pretty new at this so I am not sure what all you need to get a better understanding of my database but Ill give a basic description.

Every Morning I get two emails with two different excel sheets in them. From the email I drop them into the "Inbound:" folder and the "Outbound" folder. After that I rename the files "In" and "Out" respectively as well as the first tab in the sheet. From there I used a DoCmd.RunSavedQuery to amend these two roster to the two table's name "Inbound" and "Outbound". After that it will auto add the date with a simple amend query and then from there is where I want it to look for and delete duplicate entries via the SSN being submitted more than once within the 10 day window. {{//the reason for this is because I get these rosters as they manifest not as they actually fly so I get the same names three days in a row sometimes depending on how hard it is to get out. I am only wanting to capture the last day (or the day they flew)// the 10 day window is to keep from deleting those people that have manifested for different reasons in the past and are now coming again after leaving//}} Below is the VBA i have used to do this. The delete queries are where I will insert the SQL to get the correct outcome.

Expand|Select|Wrap|Line Numbers
  1. Private Sub Command27_Click()
  2.  
  3.      DoCmd.SetWarnings False
  4.  
  5.         'Get confirmation from user
  6.           If vbYes = MsgBox("Are you sure you want to import today's JRC Manifest?", vbYesNo, "Importing") Then
  7.             'Check for file existance Inbound
  8.               If Len(Dir("\\lnknfs01\share\rcsw\c-1\0-II MEF (FWD) (1 MAR 2011)\0-Manpower Section\JRC Manifest\Inbound\In.xlsx")) = "0" Then
  9.                  'File does not exist, import blank sheet
  10.                   DoCmd.RunSavedImportExport "Import-Inblk"
  11.                 Else
  12.                   'File exists, import
  13.                   DoCmd.RunSavedImportExport "Import-In"
  14.  
  15.               End If
  16.  
  17.  
  18.             'Check for file existance outbound
  19.               If Len(Dir("\\lnknfs01\share\rcsw\c-1\0-II MEF (FWD) (1 MAR 2011)\0-Manpower Section\JRC Manifest\Outbound\Out.xlsx")) = "0" Then
  20.                   'File does not exist, import blank sheet
  21.                   DoCmd.RunSavedImportExport "Import-Outblk"
  22.                 Else
  23.                   'File exists, import
  24.                   DoCmd.RunSavedImportExport "Import-Out"
  25.                End If
  26.  
  27.            'Done importing, update screen
  28.              DoCmd.Requery ""
  29.           End If
  30.         'Add Date
  31.             DoCmd.OpenQuery "inbound query"
  32.             DoCmd.OpenQuery "outbound query"
  33.         'Date added update screen
  34.             DoCmd.Requery ""
  35.  
  36.         'Delete Duplicates
  37.             DoCmd.OpenQuery "Delete Dup SSN for Inbound", acViewNormal, acEdit
  38.             DoCmd.OpenQuery "Delete Dup SSN for Outbound", acViewNormal, acEdit
  39.         'Date added update screen
  40.             DoCmd.Requery ""
  41.  
  42.      DoCmd.SetWarnings True
  43.  
  44.         End Sub
  45.  
Does that help you understand the goal any better?

** There are no relationships as this is a SharePoint linked Database, in which it doesn't allow relationships. It also adds an auto ID but I do not use it at all. It is hidden. **

Sgt B
Dec 30 '11 #10

Brilstern
100+
P: 207
Sierra7,

I understand your thought here but that causes an issue seeing as to I want it to delete the old record for several reasons.

1. The new record might have an updated ULN or Unit
2. I want to keep the ID order in order as best as possible for future ideas.
3. In some cases I need to keep duplicates in the case that they came were manifested at a much earlier date and have left and came back.

That is why I need it to add all data and then make changes based on the data all at once.

Sgt B
Dec 30 '11 #11

NeoPa
Expert Mod 15k+
P: 31,492
Actually Stevan, I was thinking more of the meta-data (info about the layout / structure) for the table, and within that particularly, the PK. Meta-data can be posted as illustrated by the following example :
Table Name=[tblStudent]
Expand|Select|Wrap|Line Numbers
  1. Field           Type      IndexInfo
  2. StudentID       AutoNumber    PK
  3. Family          String        FK
  4. Name            String
  5. University      String        FK
  6. Mark            Numeric
  7. LastAttendance  Date/Time
PK & FK stand for Primary Key & Foreign Key respectively. Never use TABs in this as the layout gets mucked up. Use spaces and all is fine (Remembering of course, that BB Code takes up no space in the resultant post).
Dec 30 '11 #12

Brilstern
100+
P: 207
ok still a little lost but here is what I understand from your post


Table Name=[Inbound]

Expand|Select|Wrap|Line Numbers
  1. Field             Type
  2. NAME              Text
  3. SSN               Text 
  4. RANK              Text   
  5. ULN               Text
  6. UNIT              Text
  7. DATE              Date/Time    
  8. MONTH             Text
  9.  
No PK or FK


**this is for my test table i am currently working, as soon as i relink up with SharePoint it adds like 10 extra field...**
Sgt B
Dec 30 '11 #13

NeoPa
Expert Mod 15k+
P: 31,492
Stevan:
It also adds an auto ID but I do not use it at all. It is hidden.
Your post #10 seemed to indicate the table had a PK that was added for you.
NeoPa:
A faster-running alternative would involve JOINing two recordsets together via the PK,
As I indicated in post #8, the PK would be a requirement for exploring a solution based on joining the table to itself.
Dec 31 '11 #14

Brilstern
100+
P: 207
Table Name=[Inbound]

Expand|Select|Wrap|Line Numbers
  1. Field                    Type
  2. ID                       AutoNumber
  3. NAME                     Text
  4. SSN                      Text 
  5. RANK                     Text   
  6. ULN                      Text
  7. UNIT                     Text
  8. DATE                     Date/Time    
  9. MONTH                    Text
  10. CONTENT TYPE             Text
  11. FILE TYPE                Text
  12. ATTACHMENTS              Attachments
  13. WORKFLOW INSTANCE ID     Text
  14. MODIFIED                 Date/Time
  15. CREATED                  Date/Time
  16. CREATED BY               Number
  17. MODIFIED BY              Number
  18. URL PATH                 Text
  19. PATH                     Text
  20. ITEM TYPE                Text
  21. ENCODED ABSOLUTE URL     Text
This what it looks like after I upload it to SharePoint. Please keep in mind I cannot create relationships nor can I edit the fields that SharePoint added. Unfortunately SharePoint limits MS Access a lot.
Dec 31 '11 #15

NeoPa
Expert Mod 15k+
P: 31,492
I think I may have hit a brick-wall here Stevan. What you have provided is perfect, so it's not related to that, but a DELETE query must work with what is essentially an updatable query (See Reasons for a Query to be Non-Updatable). I'll have a look for you anyway (as you've gone to the trouble of providing exactly what I requested), but I expect I can find a faster-running display query, without unfortunately, being able to turn that same basic query successfully into a DELETE one for you, but we'll see.
Dec 31 '11 #16

NeoPa
Expert Mod 15k+
P: 31,492
I may have a valid version here. Try it out and let us know how you fare with it. There is a possible faster version below it, but I'm not sure whether or not the difference will have a noticeable/any effect with your data :
Expand|Select|Wrap|Line Numbers
  1. DELETE *
  2. FROM   [Inbound] AS [tIo]
  3. WHERE  ([ID] IN(SELECT tI1.ID
  4.                      , tI1.Date
  5.                      , tI2.Date
  6.                 FROM   [Inbound] AS [tI1]
  7.                        INNER JOIN
  8.                        [Inbound] AS [tI2]
  9.                   ON   (tI1.SSN = tI2.SSN)
  10.                 WHERE  (tI1.SSN IS NOT NULL)
  11.                   AND  (tI1.Date < DateAdd('d', -10, tI2.Date))))
Other version to try :
Expand|Select|Wrap|Line Numbers
  1. DELETE *
  2. FROM   [Inbound]
  3. WHERE  ([ID] IN(SELECT DISTINCT [ID]
  4.                 FROM   (SELECT tI1.ID
  5.                              , tI1.Date
  6.                              , tI2.Date
  7.                         FROM   [Inbound] AS [tI1]
  8.                                INNER JOIN
  9.                                [Inbound] AS [tI2]
  10.                           ON   (tI1.SSN = tI2.SSN)
  11.                         WHERE  (tI1.SSN IS NOT NULL)
  12.                           AND  (tI1.Date < DateAdd('d', -10, tI2.Date)))))
Dec 31 '11 #17

NeoPa
Expert Mod 15k+
P: 31,492
BTW. The trick, if it works, to getting the query to be updatable is simply to put the non-updatable SQL into the WHERE clause within the IN(). This means that the whole (non-updatable) query is run and simply provides a set of IDs which match those of the records that need deleting. Hopefully that should work.

Having looked through the SQL again though, I see that the approach I thought I needed to use, which was non-updatable, proved unnecessary in the end (I think). If I'm right that means that the simpler SQL below may even work as is (which requires no sub-queries at all even, and should execute considerably faster). Give it a try anyway and report the results here if you wouldn't mind :
Expand|Select|Wrap|Line Numbers
  1. DELETE tI1.*
  2. FROM   [Inbound] AS [tI1]
  3.        INNER JOIN
  4.        [Inbound] AS [tI2]
  5.   ON   (tI1.SSN = tI2.SSN)
  6. WHERE  (tI1.SSN IS NOT NULL)
  7.   AND  (tI1.Date < DateAdd('d', -10, tI2.Date))))
Dec 31 '11 #18

Brilstern
100+
P: 207
OK so all the answers!?!

NeoPa,

I have tried using all three queries and to no avail I of course ran into problems (not because of you, but just my luck). The first query provided gives me an error:

Expand|Select|Wrap|Line Numbers
  1. You have written a subquery that can return more than
  2. one field without using the EXISTS reserved word in the
  3. main query's FROM clause. Revise the SELECT statement
  4. of the subquery to request only one field.
The second query ran all night with no progress, eventually I had to restart MS Access via the task manager in windows.

The third one gave the below error:

Expand|Select|Wrap|Line Numbers
  1. Specify the table containing the records you want to
  2. delete.
  3.  
At a loss here, not really sure where to go...

Sgt B

!!(I greatly appreciate all the help)!!
Jan 3 '12 #19

Expert Mod 2.5K+
P: 2,545
I have followed this one with interest, and have tried a number of ways to get Access's very frustrating DELETE queries to work with some form of join, but without success. As subqueries have been found to be very slow in operation I have tried out a hybrid approach using a joined query to select records for deletion, and then using the query in a two-recordset VBA function to do the deletion.

I tested this approach on a ten-times larger set of data than has been mentioned so far - a table containing 500,500 records, of which 170,625 were to be selected for deletion.

Calling the delete function from a form and then timing its operation from start to finish, the deletion of 170,625 records took 5 minutes 42 seconds on my laptop (which is far from being a performance machine).

The first qry, qryMaxDate, returns the maximum date for that SSN. The second qry, qryDeleteSelect, joins Inbound to qryMaxDate on the SSN and then uses datediff to select records between 1 and 10 days old. Within the VBA function this query is then opened and looped through to delete records within the inbound table.

qryMaxDate
Expand|Select|Wrap|Line Numbers
  1. SELECT   
  2.       SSN, 
  3.       Max([Date]) AS MaxDate
  4. FROM     
  5.       INBOUND
  6. GROUP BY 
  7.       SSN
  8. ORDER BY 
  9.       SSN;
qryDeleteSelect
Expand|Select|Wrap|Line Numbers
  1. SELECT
  2.        INBOUND.ID, 
  3.        INBOUND.SSN, 
  4.        INBOUND.Date, 
  5.        qryMaxDate.MaxDate, 
  6.        DateDiff('d',[Date],[MaxDate]) AS Diff
  7. FROM   
  8.        INBOUND 
  9. INNER JOIN 
  10.        qryMaxDate 
  11. ON 
  12.        INBOUND.SSN = qryMaxDate.SSN
  13. WHERE 
  14.        (((DateDiff('d',[Date],[MaxDate])) Between 1 And 10))
  15.  
  16. ORDER BY 
  17.        INBOUND.ID;
VBA function
Expand|Select|Wrap|Line Numbers
  1. Public Function fDeleteRecords() As Long
  2.     'Deletes records from table INBOUND where those records match the ID field of
  3.     'records selected for deletion by qryDeleteSelect.
  4.     'Return value is the count of the number of records deleted
  5.     '
  6.     'Note: table Inbound's ID field must be the primary key - an autonumber
  7.     'or longint value which will by default allow the recordset to be ordered ascending
  8.     'on the ID value.
  9.     '
  10.     Dim rsInbound As DAO.Recordset
  11.     Dim rsDeleteSelect As DAO.Recordset
  12.     Dim lngRecCount As Long
  13.     Dim varRecKey As Long
  14.     Dim varDeleteKey As Long
  15.     DoCmd.Hourglass True
  16.     Set rsInbound = CurrentDb.OpenRecordset("Inbound")
  17.     Set rsDeleteSelect = CurrentDb.OpenRecordset("qryDeleteSelect")
  18.     Do While Not (rsDeleteSelect.EOF Or rsInbound.EOF)
  19.         varDeleteKey = rsDeleteSelect("ID")
  20.         varRecKey = rsInbound("ID")
  21.         If varRecKey <> varDeleteKey Then
  22.             rsInbound.MoveNext
  23.         Else
  24.             Application.Echo True, "Deleting record " & varRecKey
  25.             DoEvents
  26.             rsInbound.Delete
  27.             lngRecCount = lngRecCount + 1
  28.             rsDeleteSelect.MoveNext
  29.             rsInbound.MoveNext
  30.         End If
  31.     Loop
  32.     rsDeleteSelect.Close
  33.     rsInbound.Close
  34.     DoCmd.Hourglass False
  35.     fDeleteRecords = lngRecCount
  36. End Function
  37.  
I'd have liked to take a pure SQL approach as NeoPa did, but subqueries run too slowly with large datasets, and even using his stripped down SQL with all the thought that went into it I could not get a delete query to function without using subqueries whatever I did!

Please note the assumption that there is indeed an ID field of autonumber or other longint form in the 'inbound' table, which allows the recordset to be opened in primary key order by default.

When I tried the VBA function using a query based on table Inbound ordered by ID field instead of opening the table directly Access errored after around 50,000 records had been processed, with the error message advising that I needed to increase the MaxLocks registry key value. I'm using Access 2010 on my laptop, and I'm not sure if this is something peculiar to A2010 or not.

-Stewart
Jan 3 '12 #20

NeoPa
Expert Mod 15k+
P: 31,492
Lines #4 & #5 of the first batch of SQL from post #17 should have been omitted. Including those lines was an error.

I'm a bit flummoxed on the third one (from post #18) as the table to be deleted from is specified explicitly on line #1. The Help section for the DELETE clause in Access is not very helpful and quite ambiguous, nevertheless I believe I've struggled with this before and found this format both to be consistent with what is there, as well as to work. That said, I knocked up an [Inbound] table with some indicative test data and was unable to get the query to execute :-( It basically wouldn't work, even though it conformed to the syntax indicated in the Help section.

This leaves my best suggestion to date as :
Expand|Select|Wrap|Line Numbers
  1. DELETE *
  2. FROM   [Inbound]
  3. WHERE  ([ID] IN(SELECT DISTINCT tI1.ID
  4.                 FROM   [Inbound] AS [tI1]
  5.                        INNER JOIN
  6.                        [Inbound] AS [tI2]
  7.                   ON   (tI1.SSN = tI2.SSN)
  8.                 WHERE  (tI1.SSN IS NOT NULL)
  9.                   AND  (tI1.Date < DateAdd('d', -10, tI2.Date))))
A couple of small changes beyond simply removing lines #4 & #5, but basically almost identical to the first suggestion of post #17. It worked for me logically (It executed correctly and removed the correct records), but I have no information on its relative performance.

PS. If you find the performance unacceptable then try removing the predicate DISTINCT from line #3 and see how that goes. In my tests both versions worked, but my data wasn't verbose enough to indicate which would handle better under load.
Jan 3 '12 #21

Brilstern
100+
P: 207
NeoPa,

I appreciate all the help.

Running the latest query at the moment. I will give more feed back after it runs.

!!THANKS!!

Sgt B
Jan 3 '12 #22

Brilstern
100+
P: 207
Ok so weird outcome. After running it (only took about 30 min on 52000 records) it deleted all records that did not have a SSN. Reason being, it is not reading all the records that don't contain a SSN as NULL, it reads them as BLANK!?! I believe this is because all these records were imported instead of being created in access. So from there I made a logical change (hopefully correct, I was just guessing) and I am running the below query now. Ideas?

Expand|Select|Wrap|Line Numbers
  1.     DELETE *
  2.     FROM   [Inbound]
  3.     WHERE  ([ID] IN(SELECT DISTINCT tI1.ID
  4.                     FROM   [Inbound] AS [tI1]
  5.                            INNER JOIN
  6.                            [Inbound] AS [tI2]
  7.                       ON   (tI1.SSN = tI2.SSN)
  8.                     WHERE  (tI1.SSN = NOT BLANK)
  9.                       AND  (tI1.Date < DateAdd('d', -10, tI2.Date))))
Sgt B
Jan 3 '12 #23

Brilstern
100+
P: 207
After I ran my query above and it found 0 records to delete...

Sgt B
Jan 3 '12 #24

NeoPa
Expert Mod 15k+
P: 31,492
Stevan Bias:
Ideas?
Yes. First determine exactly what is in the data that appears to you as blank. The three most common values are :
  1. Null
  2. Empty string ('').
  3. Space(s) (' ').
It appears that these records don't respond to being filtered by Null, so check out what is actually there first and report back. #2 is certainly more likely statistically, so check for that first. Open the table and add a filter and see if the number of records returned matches the number that don't have visible values. To filter by '' simply enter '' into the [SSN] field in the filter.

When we know what we're working with we can move forward, but in case it's the likely option the code for line #8 would be :
Expand|Select|Wrap|Line Numbers
  1.                     WHERE  (tI1.SSN > '')
Jan 3 '12 #25

Brilstern
100+
P: 207
Ok, so the above is good, running it now...
Jan 3 '12 #26

NeoPa
Expert Mod 15k+
P: 31,492
Stevan Bias:
Ok, so the above is good
I take that to mean that the 'empty' values are actually recognised as '' (empty strings)?
Jan 3 '12 #27

Brilstern
100+
P: 207
Yes sir. And it eventually stopped running and froze... This is such bad luck!

I am going to retry in to morning, so I will try to get on with any feedback here in about 12 hours. Please excuse my slow working speed.
Jan 3 '12 #28

NeoPa
Expert Mod 15k+
P: 31,492
Stevan Bias:
Please excuse my slow working speed.
Go at your own speed. I have plenty to keep me busy until you post another reply ;-)

PS. Don't assume that just because Access freezes that it's stopped running. All long-running queries cause Access to freeze. It doesn't indicate a problem at all. Just that Access is still busy and the interface is a little useless.
Jan 3 '12 #29

Brilstern
100+
P: 207
I know that but unfortunately I have other databases to work on so I had to stop it to get something accomplished. I just posted a question about that one to...
Jan 3 '12 #30

NeoPa
Expert Mod 15k+
P: 31,492
You know you can run multiple instances of Access right? Each with their own separate database running?
Jan 3 '12 #31

Brilstern
100+
P: 207
Ya but I am on a government computer and it runs out of resources very easily. I cant run Access in two different instances and still function properly on my computer ie. Email, Internet, etc. All is well though. Not much I can do about it.
Jan 3 '12 #32

Brilstern
100+
P: 207
Just an update of anyone looking for an answer, just tried this recently on a database and it worked. The best answer chosen above is working.
Apr 23 '14 #33

Post your reply

Sign in to post your reply or Sign up for a free account.