468,140 Members | 1,406 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 468,140 developers. It's quick & easy.

Conditional Delete of Duplicate Records from Access

I have a access query that has more than 400000 records including duplicates.First I need help in finding a way to filter the whole 400000 records and list all duplicates.
Lets say ID, Job & Description are the columns that make duplicates. For example, see the 2 rows below,

12345678, Cashier, Emergency Money
12345678, Cashier

I want to delete the second row because the its third column is blank.

Sometimes the rows could be like this too,

89898989, , Financial Aid
89898989, Accountant, Financial Aid

In this example, I want to delete the first row because its second column is blank.

Any help will be really appreciated.
Nov 3 '07 #1
7 5139
Jim Doherty
897 Expert 512MB
I have a access query that has more than 400000 records including duplicates.First I need help in finding a way to filter the whole 400000 records and list all duplicates.
Lets say ID, Job & Description are the columns that make duplicates. For example, see the 2 rows below,

12345678, Cashier, Emergency Money
12345678, Cashier

I want to delete the second row because the its third column is blank.

Sometimes the rows could be like this too,

89898989, , Financial Aid
89898989, Accountant, Financial Aid

In this example, I want to delete the first row because its second column is blank.

Any help will be really appreciated.
Hi AccessHunter,

Post your table name and fieldnames please for the table on which you wish to perfom the deletions that way any contributors can properly reference them in any coded responses given

Thanks

Jim :)
Nov 3 '07 #2
Hi AccessHunter,

Post your table name and fieldnames please for the table on which you wish to perfom the deletions that way any contributors can properly reference them in any coded responses given

Thanks

Jim :)
Sorry abotu not mentioning the correct table and column names.
Table Name - Loan Officer Details
Column names - Clerk ID, Position, Description

My requirement is to find any duplicate rows for (Clerk ID + Position + Description) and then delete the rows as per my previous email.

Thanks for the help.
Nov 3 '07 #3
Jim Doherty
897 Expert 512MB
Sorry abotu not mentioning the correct table and column names.
Table Name - Loan Officer Details
Column names - Clerk ID, Position, Description

My requirement is to find any duplicate rows for (Clerk ID + Position + Description) and then delete the rows as per my previous email.

Thanks for the help.

Given your datasize it might be wise to run this SQL over your table first to determine how many duplications you have where either one or the other columns ie Position or Description is null or zero length string.

In order for you to do this you must add an extra field to the table called RowID and set its datatype to autonumber (which will automatically populate it with data when you switch to table view) or 'Number' datatype in which case you must populate it yourself with sequential numbers in order to identify each individual row. The identification of each row is important because you would need to identify where you are within the recordset during any subsequent traversing of the data within the SQL.

The assumption I have made here is that IF ALL THREE fields are complete then duplications in that regard are OK and you would not want to delete those records, as they are complete and form records on the many side of a one to many relationship with another table? I don't know your database schema.

Have a look at this query first see what it shows you and then give me an idea on the number of rows it returns, highest and lowest counts etc.

Expand|Select|Wrap|Line Numbers
  1. SELECT [Loan Officer Details].[Clerk ID], [Loan Officer Details].Position, [Loan Officer Details].Description, Count([Loan Officer Details].RowID) AS Duplications, Min([Loan Officer Details].RowID) AS MinimumRow
  2. FROM [Loan Officer Details]
  3. GROUP BY [Loan Officer Details].[Clerk ID], [Loan Officer Details].Position, [Loan Officer Details].Description
  4. HAVING ((([Loan Officer Details].Position)="" Or ([Loan Officer Details].Position) Is Null) AND (([Loan Officer Details].Description)<>"" Or ([Loan Officer Details].Description) Is Not Null)) OR ((([Loan Officer Details].Position)<>"" Or ([Loan Officer Details].Position) Is Not Null) AND (([Loan Officer Details].Description)="" Or ([Loan Officer Details].Description) Is Null))
  5. ORDER BY [Loan Officer Details].[Clerk ID];
The fourth column of this query is designed to to give you a count of the number of duplications that exist for each Clerk ID shown in column 1 for each row where either one or the other columns have data in them or not.

The fifth column is designed to reflect the minimum ROWID number for that duplication set. Logic can ultimately be developed on that, much as to say something like this:

"Loop through the Loan Officer Details table and delete all records for the given Clerk ID displayed in this query row but only where the Position and Description fields are equal to the Position and Description fields shown in this query AND only where the ROWID is greater than the minimum RowID number that this query is displaying".

If this logic agrees with you and your stored data (the content of which you will see aggregated by ClerkID when you run this query) then we can advance to writing something that pulls out data, traverses it and deletes records according to that logic.

Theres no point in doing anything if my assumptions are wrong or if your data shows you something that you didnt expect? and which might get inadavertently deleted if any program flow didnt take it into account.

Regards

Jim :)
Nov 4 '07 #4
Given your datasize it might be wise to run this SQL over your table first to determine how many duplications you have where either one or the other columns ie Position or Description is null or zero length string.

In order for you to do this you must add an extra field to the table called RowID and set its datatype to autonumber (which will automatically populate it with data when you switch to table view) or 'Number' datatype in which case you must populate it yourself with sequential numbers in order to identify each individual row. The identification of each row is important because you would need to identify where you are within the recordset during any subsequent traversing of the data within the SQL.

The assumption I have made here is that IF ALL THREE fields are complete then duplications in that regard are OK and you would not want to delete those records, as they are complete and form records on the many side of a one to many relationship with another table? I don't know your database schema.

Have a look at this query first see what it shows you and then give me an idea on the number of rows it returns, highest and lowest counts etc.

Expand|Select|Wrap|Line Numbers
  1. SELECT [Loan Officer Details].[Clerk ID], [Loan Officer Details].Position, [Loan Officer Details].Description, Count([Loan Officer Details].RowID) AS Duplications, Min([Loan Officer Details].RowID) AS MinimumRow
  2. FROM [Loan Officer Details]
  3. GROUP BY [Loan Officer Details].[Clerk ID], [Loan Officer Details].Position, [Loan Officer Details].Description
  4. HAVING ((([Loan Officer Details].Position)="" Or ([Loan Officer Details].Position) Is Null) AND (([Loan Officer Details].Description)<>"" Or ([Loan Officer Details].Description) Is Not Null)) OR ((([Loan Officer Details].Position)<>"" Or ([Loan Officer Details].Position) Is Not Null) AND (([Loan Officer Details].Description)="" Or ([Loan Officer Details].Description) Is Null))
  5. ORDER BY [Loan Officer Details].[Clerk ID];
The fourth column of this query is designed to to give you a count of the number of duplications that exist for each Clerk ID shown in column 1 for each row where either one or the other columns have data in them or not.

The fifth column is designed to reflect the minimum ROWID number for that duplication set. Logic can ultimately be developed on that, much as to say something like this:

"Loop through the Loan Officer Details table and delete all records for the given Clerk ID displayed in this query row but only where the Position and Description fields are equal to the Position and Description fields shown in this query AND only where the ROWID is greater than the minimum RowID number that this query is displaying".

If this logic agrees with you and your stored data (the content of which you will see aggregated by ClerkID when you run this query) then we can advance to writing something that pulls out data, traverses it and deletes records according to that logic.

Theres no point in doing anything if my assumptions are wrong or if your data shows you something that you didnt expect? and which might get inadavertently deleted if any program flow didnt take it into account.

Regards

Jim :)

Hi Jim,

I haven't yet tried your query yet.I want to add one more condition to the delete. Once I find the duplicates if the Description has the text "Emergency" I want to delete that record.

Example,

12345678, Loan Officer, Emergency Loan Person
12345678, , Special Loan

In this case even though the second column is blank for the 2nd record, I want to delete the 1st record which has the text "Emengency" in the Description field.

Sorry about not mentioning this before.

Thanks.
Nov 4 '07 #5
Jim Doherty
897 Expert 512MB
Hi Jim,

I haven't yet tried your query yet.I want to add one more condition to the delete. Once I find the duplicates if the Description has the text "Emergency" I want to delete that record.

Example,

12345678, Loan Officer, Emergency Loan Person
12345678, , Special Loan

In this case even though the second column is blank for the 2nd record, I want to delete the 1st record which has the text "Emengency" in the Description field.

Sorry about not mentioning this before.

Thanks.

Thats ok, I got that . Just try that query when you have time and get back to me. No point in coding anything until we can satisfy ourselves that we are happy with everything. Wouldn't want anything to bite us in the rear.

Jim :)
Nov 4 '07 #6
Using your idea I was able to group my table more efficiently, Now my real table looks like this,

Fields :- Job Num, Seq Num, Vendor Last Name, Vendor First Name, Vendor Middle Initial, Agency, TranDate

Data will be as follows,

27809, 1, MILLER , , , Young Men's Association , 12/21/2002
56634, 1, PEOPLES, , , People Inc , 12/02/2003
22345, 2, COX , , , Family Business Corporation, 11/19/2003
22345, 2, COX , , , Family Links , 11/19/2003

My idea to seperate the duplicates is as follows,

First look at TranDate to see if there are any duplicates
IF found,
Look to see if Job Num+Seq Num gives any duplicates
If Yes,
Those rows are duplicates for that TranDate .

Basically, what I am trying to do is eliminate duplicate Job Num/Seq Num for a given TranDate. Multiple data entry people has entered the same information 2 or more times. I am trying to first find those duplicates now. My next step will be to identify the correct row to be deleted.


Thanks
Nov 5 '07 #7
Jim Doherty
897 Expert 512MB
Using your idea I was able to group my table more efficiently, Now my real table looks like this,

Fields :- Job Num, Seq Num, Vendor Last Name, Vendor First Name, Vendor Middle Initial, Agency, TranDate

Data will be as follows,

27809, 1, MILLER , , , Young Men's Association , 12/21/2002
56634, 1, PEOPLES, , , People Inc , 12/02/2003
22345, 2, COX , , , Family Business Corporation, 11/19/2003
22345, 2, COX , , , Family Links , 11/19/2003

My idea to seperate the duplicates is as follows,

First look at TranDate to see if there are any duplicates
IF found,
Look to see if Job Num+Seq Num gives any duplicates
If Yes,
Those rows are duplicates for that TranDate .

Basically, what I am trying to do is eliminate duplicate Job Num/Seq Num for a given TranDate. Multiple data entry people has entered the same information 2 or more times. I am trying to first find those duplicates now. My next step will be to identify the correct row to be deleted.


Thanks
On the basis of that description I'll leave it with you. You can get a simple Access duplicates query to do that for you

Jim
Nov 5 '07 #8

Post your reply

Sign in to post your reply or Sign up for a free account.

Similar topics

27 posts views Thread by didacticone | last post: by
1 post views Thread by gcdp | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.