Conditional Delete of Duplicate Records from Access

I have a access query that has more than 400000 records including duplicates.First I need help in finding a way to filter the whole 400000 records and list all duplicates.
Lets say ID, Job & Description are the columns that make duplicates. For example, see the 2 rows below,

12345678, Cashier, Emergency Money
12345678, Cashier

I want to delete the second row because the its third column is blank.

Sometimes the rows could be like this too,

89898989, , Financial Aid
89898989, Accountant, Financial Aid

In this example, I want to delete the first row because its second column is blank.

Any help will be really appreciated.

Nov 3 '07 #1

Subscribe Post Reply

5525

Jim Doherty

897

Expert 512MB

I have a access query that has more than 400000 records including duplicates.First I need help in finding a way to filter the whole 400000 records and list all duplicates.
Lets say ID, Job & Description are the columns that make duplicates. For example, see the 2 rows below,

12345678, Cashier, Emergency Money
12345678, Cashier

I want to delete the second row because the its third column is blank.

Sometimes the rows could be like this too,

89898989, , Financial Aid
89898989, Accountant, Financial Aid

In this example, I want to delete the first row because its second column is blank.

Any help will be really appreciated.

Hi AccessHunter,

Post your table name and fieldnames please for the table on which you wish to perfom the deletions that way any contributors can properly reference them in any coded responses given

Thanks

Jim :)

Nov 3 '07 #2

AccessHunter

Hi AccessHunter,

Post your table name and fieldnames please for the table on which you wish to perfom the deletions that way any contributors can properly reference them in any coded responses given

Thanks

Jim :)

Sorry abotu not mentioning the correct table and column names.
Table Name - Loan Officer Details
Column names - Clerk ID, Position, Description

My requirement is to find any duplicate rows for (Clerk ID + Position + Description) and then delete the rows as per my previous email.

Thanks for the help.

Nov 3 '07 #3

Jim Doherty

897

Expert 512MB

Sorry abotu not mentioning the correct table and column names.
Table Name - Loan Officer Details
Column names - Clerk ID, Position, Description

My requirement is to find any duplicate rows for (Clerk ID + Position + Description) and then delete the rows as per my previous email.

Thanks for the help.

Given your datasize it might be wise to run this SQL over your table first to determine how many duplications you have where either one or the other columns ie Position or Description is null or zero length string.

In order for you to do this you must add an extra field to the table called RowID and set its datatype to autonumber (which will automatically populate it with data when you switch to table view) or 'Number' datatype in which case you must populate it yourself with sequential numbers in order to identify each individual row. The identification of each row is important because you would need to identify where you are within the recordset during any subsequent traversing of the data within the SQL.

The assumption I have made here is that IF ALL THREE fields are complete then duplications in that regard are OK and you would not want to delete those records, as they are complete and form records on the many side of a one to many relationship with another table? I don't know your database schema.

Have a look at this query first see what it shows you and then give me an idea on the number of rows it returns, highest and lowest counts etc.

Expand|Select|Wrap|Line Numbers

 SELECT [Loan Officer Details].[Clerk ID], [Loan Officer Details].Position, [Loan Officer Details].Description, Count([Loan Officer Details].RowID) AS Duplications, Min([Loan Officer Details].RowID) AS MinimumRow

FROM [Loan Officer Details]

GROUP BY [Loan Officer Details].[Clerk ID], [Loan Officer Details].Position, [Loan Officer Details].Description

HAVING ((([Loan Officer Details].Position)="" Or ([Loan Officer Details].Position) Is Null) AND (([Loan Officer Details].Description)<>"" Or ([Loan Officer Details].Description) Is Not Null)) OR ((([Loan Officer Details].Position)<>"" Or ([Loan Officer Details].Position) Is Not Null) AND (([Loan Officer Details].Description)="" Or ([Loan Officer Details].Description) Is Null))

ORDER BY [Loan Officer Details].[Clerk ID];

The fourth column of this query is designed to to give you a count of the number of duplications that exist for each Clerk ID shown in column 1 for each row where either one or the other columns have data in them or not.

The fifth column is designed to reflect the minimum ROWID number for that duplication set. Logic can ultimately be developed on that, much as to say something like this:

"Loop through the Loan Officer Details table and delete all records for the given Clerk ID displayed in this query row but only where the Position and Description fields are equal to the Position and Description fields shown in this query AND only where the ROWID is greater than the minimum RowID number that this query is displaying".

If this logic agrees with you and your stored data (the content of which you will see aggregated by ClerkID when you run this query) then we can advance to writing something that pulls out data, traverses it and deletes records according to that logic.

Theres no point in doing anything if my assumptions are wrong or if your data shows you something that you didnt expect? and which might get inadavertently deleted if any program flow didnt take it into account.

Regards

Jim :)

Nov 4 '07 #4

AccessHunter

Given your datasize it might be wise to run this SQL over your table first to determine how many duplications you have where either one or the other columns ie Position or Description is null or zero length string.

In order for you to do this you must add an extra field to the table called RowID and set its datatype to autonumber (which will automatically populate it with data when you switch to table view) or 'Number' datatype in which case you must populate it yourself with sequential numbers in order to identify each individual row. The identification of each row is important because you would need to identify where you are within the recordset during any subsequent traversing of the data within the SQL.

The assumption I have made here is that IF ALL THREE fields are complete then duplications in that regard are OK and you would not want to delete those records, as they are complete and form records on the many side of a one to many relationship with another table? I don't know your database schema.

Have a look at this query first see what it shows you and then give me an idea on the number of rows it returns, highest and lowest counts etc.

Expand|Select|Wrap|Line Numbers

SELECT [Loan Officer Details].[Clerk ID], [Loan Officer Details].Position, [Loan Officer Details].Description, Count([Loan Officer Details].RowID) AS Duplications, Min([Loan Officer Details].RowID) AS MinimumRow

FROM [Loan Officer Details]

GROUP BY [Loan Officer Details].[Clerk ID], [Loan Officer Details].Position, [Loan Officer Details].Description

HAVING ((([Loan Officer Details].Position)="" Or ([Loan Officer Details].Position) Is Null) AND (([Loan Officer Details].Description)<>"" Or ([Loan Officer Details].Description) Is Not Null)) OR ((([Loan Officer Details].Position)<>"" Or ([Loan Officer Details].Position) Is Not Null) AND (([Loan Officer Details].Description)="" Or ([Loan Officer Details].Description) Is Null))

ORDER BY [Loan Officer Details].[Clerk ID];

The fourth column of this query is designed to to give you a count of the number of duplications that exist for each Clerk ID shown in column 1 for each row where either one or the other columns have data in them or not.

The fifth column is designed to reflect the minimum ROWID number for that duplication set. Logic can ultimately be developed on that, much as to say something like this:

"Loop through the Loan Officer Details table and delete all records for the given Clerk ID displayed in this query row but only where the Position and Description fields are equal to the Position and Description fields shown in this query AND only where the ROWID is greater than the minimum RowID number that this query is displaying".

If this logic agrees with you and your stored data (the content of which you will see aggregated by ClerkID when you run this query) then we can advance to writing something that pulls out data, traverses it and deletes records according to that logic.

Theres no point in doing anything if my assumptions are wrong or if your data shows you something that you didnt expect? and which might get inadavertently deleted if any program flow didnt take it into account.

Regards

Jim :)

Hi Jim,

I haven't yet tried your query yet.I want to add one more condition to the delete. Once I find the duplicates if the Description has the text "Emergency" I want to delete that record.

Example,

12345678, Loan Officer, Emergency Loan Person
12345678, , Special Loan

In this case even though the second column is blank for the 2nd record, I want to delete the 1st record which has the text "Emengency" in the Description field.

Sorry about not mentioning this before.

Thanks.

Nov 4 '07 #5

Jim Doherty

897

Expert 512MB

Hi Jim,

I haven't yet tried your query yet.I want to add one more condition to the delete. Once I find the duplicates if the Description has the text "Emergency" I want to delete that record.

Example,

12345678, Loan Officer, Emergency Loan Person
12345678, , Special Loan

In this case even though the second column is blank for the 2nd record, I want to delete the 1st record which has the text "Emengency" in the Description field.

Sorry about not mentioning this before.

Thanks.

Thats ok, I got that . Just try that query when you have time and get back to me. No point in coding anything until we can satisfy ourselves that we are happy with everything. Wouldn't want anything to bite us in the rear.

Jim :)

Nov 4 '07 #6

AccessHunter

Using your idea I was able to group my table more efficiently, Now my real table looks like this,

Fields :- Job Num, Seq Num, Vendor Last Name, Vendor First Name, Vendor Middle Initial, Agency, TranDate

Data will be as follows,

27809, 1, MILLER , , , Young Men's Association , 12/21/2002
56634, 1, PEOPLES, , , People Inc , 12/02/2003
22345, 2, COX , , , Family Business Corporation, 11/19/2003
22345, 2, COX , , , Family Links , 11/19/2003

My idea to seperate the duplicates is as follows,

First look at TranDate to see if there are any duplicates
IF found,
Look to see if Job Num+Seq Num gives any duplicates
If Yes,
Those rows are duplicates for that TranDate .

Basically, what I am trying to do is eliminate duplicate Job Num/Seq Num for a given TranDate. Multiple data entry people has entered the same information 2 or more times. I am trying to first find those duplicates now. My next step will be to identify the correct row to be deleted.

Thanks

Nov 5 '07 #7

Jim Doherty

897

Expert 512MB

Using your idea I was able to group my table more efficiently, Now my real table looks like this,

Fields :- Job Num, Seq Num, Vendor Last Name, Vendor First Name, Vendor Middle Initial, Agency, TranDate

Data will be as follows,

27809, 1, MILLER , , , Young Men's Association , 12/21/2002
56634, 1, PEOPLES, , , People Inc , 12/02/2003
22345, 2, COX , , , Family Business Corporation, 11/19/2003
22345, 2, COX , , , Family Links , 11/19/2003

My idea to seperate the duplicates is as follows,

First look at TranDate to see if there are any duplicates
IF found,
Look to see if Job Num+Seq Num gives any duplicates
If Yes,
Those rows are duplicates for that TranDate .

Basically, what I am trying to do is eliminate duplicate Job Num/Seq Num for a given TranDate. Multiple data entry people has entered the same information 2 or more times. I am trying to first find those duplicates now. My next step will be to identify the correct row to be deleted.

Thanks

On the basis of that description I'll leave it with you. You can get a simple Access duplicates query to do that for you

Jim

Nov 5 '07 #8

by: Jouke Langhout | last post by:

Hello all! For quite some time now, I've got the following problem: Access won't close properly when a user closes the application. An ACCESS process stays active and that process can only be...

Microsoft Access / VBA

delete records using querydef with parameter deletes ALL records

by: hpi | last post by:

Hello, I have a table : Batch It contains fields batchnummer : Number (Long Integer) datum : Date/Time status : Number (Long Integer) nr_records : Number (Long Integer)

Microsoft Access / VBA

Query to delete duplicate records in staging table

by: ms | last post by:

Access 2000: I am trying to delete duplicate records imported to a staging table leaving one of the duplicates to be imported into the live table. A unique record is based on a composite key of 3...

Microsoft Access / VBA

How to delete duplicate records and leave one?

by: KT | last post by:

Is there any one click solution that would do the trick? I would like to create a button, so the person who maintains the database can perform clean up work to delete duplicate records which...

Microsoft Access / VBA

Delete Duplicate records

by: Theodore70 | last post by:

I am trying to delete duplicate records in an access table using VB. Not a programmer here, but this was one of the codes that I found on the web. I made some modifications to it, but it still did...

Microsoft Access / VBA

how to delete the records?

by: itoxic07 | last post by:

how to delete the records ? i am having problem deleting the records including the duplicate records from access database. I inserted a textbox on a form ,the name written in a textbox must be...

Visual Basic 4 / 5 / 6

Delete Duplicate record from SQL Server2000

by: rajeshkrsingh | last post by:

Hi friends, Step1- create table duplicate ( intId int, varName varchar(50) ) insert into duplicate(intId,varName) values(1,'rajesh') insert into duplicate(intId,varName) values(2,'raj12')...

Microsoft SQL Server

delete otherwise duplicate records based on differing values in one column

by: jmstur2 | last post by:

I have a table with what I consider duplicate records. Data in all columns are duplicate except for the date column, meaning that duplicate data was entered on different dates and those dates were...

Microsoft SQL Server

Delete Duplicate Records from a large table

by: Dilip1983 | last post by:

Hi All, I want to delete duplicate records from a large table. There is one index(INDEX_U1) on 4 columns(col1,col2,col3,col4) which is in unusable state. First of all when i tried to rebuild...

Oracle Database

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

Career Advice

Couldn’t get equations in html when convert word .docx file to html file in C#.

by: conductexam | last post by:

I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...

C# / C Sharp

Conditional Delete of Duplicate Records from Access

Similar topics