delete duplicates after import - Microsoft Access / VBA

New member here, impressed with what I have searched so far...

So every day data will be imported into a table named the month (ie - 10_2011)... however, this data WILL contain duplicates which I need to get rid of. I am breaking the data up into daily imports into monthly tables because each day the data are over 500-700 rows, wanted to keep tables of manageable size. I'm not married to using this logic, so if someone has a better idea, please share!

I have used snippets of code I found here but I can't seem to make it work... Please if someone could help me, that would be great. I'm not a board certified programmer but I do know enough to break stuff!

Expand|Select|Wrap|Line Numbers

 Private Sub cmdImport_Click()

    Dim strDay As String

    Dim strMonth As String

    Dim strFilePath As String

    Dim xlobj

    Dim wsobj

    Dim rngobj

    Dim strDataRange As String
 
    strDay = Format(Now(), "mm" & "." & "dd" & "." & "yyyy")

    strMonth = Format(Now(), "mm" & "_" & "yyyy")
 
    strFilePath = "<hidden>" & strDay & ".xls"
 
    Set xlobj = GetObject(strFilePath)

    Set wsobj = xlobj.worksheets("Sheet1")

    Set rngobj = wsobj.usedrange

    rngobj.Name = srtDataRange
 
    DoCmd.SetWarnings False

    DoCmd.TransferSpreadsheet acImport, 8, strMonth, _

    strFilePath, True, strDataRange

    DoCmd.SetWarnings True
 
    Set xlobj = Nothing
 
'routine for deleting Dups created EVERY DAY when excel is imported

    Dim db As DAO.Database

    Dim rs As Recordset

    Dim strAccount As String
 
    Set db = CurrentDb

    Set rs = db.OpenRecordset(strMonth)
 
StartFile:

    rs.MoveFirst

    Do Until rs!Temp = False

     If Not rs.EOF Then

       rs.MoveNext

     Else

      GoTo EndFile

      End If

    Loop
 
    strEmail = rs!Account_No

    rs!Temp = True

    rs.MoveNext
 
    Do Until rs.EOF

    If rs!Account_No = strAccount Then

     rs.Delete

    End If

    rs.MoveNext

    Loop
 
    GoTo StartFile

EndFile:
 
 'remove temp field from table
 
    rs.Close

    Set rs = Nothing

    Set db = Nothing
 
End Sub

Oct 4 '11 #1

Subscribe Post Reply

4939

patjones

931

Expert 512MB

In regard to your data structure, I probably would not keep so many tables around that hold the exactly the same kind of data. Depending on how much information each record holds, Access can comfortably handle tens of thousands of records in a single table, or more. This is a digression from the point of your question though.

I want to start by asking what the Temp column holds, and what you are looking to accomplish by looping through the recordset until rs!Temp is false (lines 38 through 45)?

I see at least one issue further on in the code, but if you can answer this first then we can move on...

Pat

Oct 4 '11 #2

jroycestone

That was a code snippet that I "borrowed" from someone else... If there is a better procedure for looping through the data I would love to see it!

I am very much a beginner, so I could totally rip out that part of the code. Well you can see that the procedure runs the import then is supposed to check/delete the dups.

Oct 4 '11 #3

patjones

931

Expert 512MB

Generally, a copy and paste of code from some other source isn't successful without some degree of modification to fit the code to your particular situation.

So I go back to my previous questions: what is the purpose of the Temp column and why are you looping through the recordset looking for rs!Temp = False?

I'm asking because it's not clear to me how that particular section of code is related to checking for duplicates in your table. Thus, you need to be able to explain how it is relevant.

Oct 4 '11 #4

NeoPa

32,556

Expert Mod 16PB

My first advice would be to avoid the idea of multiple tables for similar data. You're not working with spreadsheets here. You're working with a database, and that requires some different thinking. A field that reflects the month will enable you to store all the data and report on it easily by month or otherwise. Filtering by month gives you the same result as multiple tables without all the restrictions and mess that go along with that approach.

As far as handling duplicates is concerned I would decide first what you understand by a duplicate. That's an important first step, without which all the rest of it is fairly meaningless.

With that new understanding in mind, I would proceed on the following basis :

Import the day's data into a special import table.
Remove any duplicates from the new data (If this involves comparing with previous days' data from the same month then so be it).
Report on or log any removals or updates if that fits your requirements.
Append the newly imported and tidied data into the main table ensuring the monh field is set appropriately.
Clear down the temporary data from the daily import table for the next run.

Oct 5 '11 #5

NeoPa

32,556

Expert Mod 16PB

Another point to understand and so avoid in future, and it's surprisingly common in less experienced coders. It is never necessary to concatenate literal string values together for the benefit of the code. Take your line #10 :

Expand|Select|Wrap|Line Numbers

strDay = Format(Now(), "mm" & "." & "dd" & "." & "yyyy")

There is absolutely no reason why this should not be written much more simply as :

Expand|Select|Wrap|Line Numbers

strDay = Format(Date(), "mm.dd.yyyy")

Of course there are many reasons why it should be done that way.

NB. Using Now() instead of Date() for date values is another mistake liable to cause confusion (and well worth avoiding).

Oct 5 '11 #6

jroycestone

The thing that is hard for me to wrap my head around is using 1 table for ALL the daily imports forever more... We're talking over 500 rows of data per day. In a month that would be over 10,000 rows per month meaning over 120,000 rows per year! That seems like it would be way too big for access to handle. Am I wrong in thinking that?

So the daily order of operation as you see it would be Import data into a "temp" table, compare temp data to current "master" data table, make duplicate corrections and any data cleanup, append data from Temp table to master table, clear data from temp table to be used the next day. Correct?

Oct 5 '11 #7

NeoPa

32,556

Expert Mod 16PB

jroycestone:
The thing that is hard for me to wrap my head around is using 1 table for ALL the daily imports forever more... We're talking over 500 rows of data per day. In a month that would be over 10,000 rows per month meaning over 120,000 rows per year! That seems like it would be way too big for access to handle. Am I wrong in thinking that?

Yes. Quite wrong. Certainly there are limits to the capacity of an Access database, but these limits are not avoided by managing a set amount of data in multiple tables. Quite the contrary, as each table has its own overheads. You will be able to manage more data in a single table than in multiple ones.

That said, if the data is vast and is never cleared down (increasing forever) it may well be wise to consider another database engine within which to store your data. It can still be accessed from an Access front-end database, but the size limitations would be less constraining. Something for you to consider.

jroycestone:
So the daily order of operation as you see it would be Import data into a "temp" table, compare temp data to current "master" data table, make duplicate corrections and any data cleanup, append data from Temp table to master table, clear data from temp table to be used the next day. Correct?

If you reread what I posted you will see I didn't say the highlighted bit as you have it. That may be the approach, but that depends on your definition of duplicate data. It may well be that you only need to check for duplicates within the new Temp data. Otherwise it sounds like you have the idea pretty well.

Oct 5 '11 #8

patjones

931

Expert 512MB

There is some dependence on what kind of data the table is going to hold, but it's not inconceivable that an Access table can store in excess of 100,000 records. I have one database that is currently in excess of 30,000 unique records, and there really hasn't been a performance hit so far.

If you have the ability to put your data on a server and link your Access front end to it, you can go into millions of records.

At some point you will probably want to consider archiving the data somewhere else; but for the sake of proper database design you should use one table and add a column for month.

The overall process as you have explained it is pretty much how you would want to do things. I recently completed an application that allows the user to search for a spreadsheet using a file picker dialog, import the selected file into a temporary table, removes duplicates, and commits the clean data to the permanent table...all with a couple button clicks. The import is a little slow on account of the spreadsheets containing anywhere from 10,000 records up, but it works.

Oct 5 '11 #9

jroycestone

So here is my "tentative" plan... have one table to store all data and do an archive quarterly to keep the data somewhat limited.

Now, the duplicate compare is needed as I described it. I'll explain a little more. The spreadsheet that gets imported daily has scheduled jobs for today and into the future; this includes any date out further than today. So every day when an import is done it will contain jobs out further than today and tomorrow so these "jobs" will continue to be in the spreadsheet every day until that job is out of our queue. Hopefully that explains how the data are more migrated on a daily basis.

Oct 5 '11 #10

NeoPa

32,556

Expert Mod 16PB

That still leaves out much necessary information to enable you to develop logic to reflect your requirements. Let's see what we can do to be getting along with anyway though.

Your post would indicate to me that you need to ensure that any job in the [Temp] table doesn't clash with any other job in the same table, as well as any job already existing in the main table. How do you recognise which of the duplicates is required? It seems you may be in a position where a job imported today has more up-to-date information with it than the same job imported on a previous day. In that case you'd need to remove the record from the main table. Somewhat different from simply tidying up the data in the [Temp] table, but can still be done.

Oct 5 '11 #11

jroycestone

jeeez, you must be experienced in data as I didn't think of that either!

Yes, I would need the latest "version" of the row of data to overwrite the old version of the data. There would be no possible way that there would be duplicates within the [Temp] table, only when referring data from the [Temp] table to the [Master] table.

There are multiple cells that need to be checked to make sure the data are truly duplicate. Account_NO, OrderDate are the two main fields.

Oct 5 '11 #12

NeoPa

32,556

Expert Mod 16PB

So, the [Temp] table never needs to be checked for internal consistency, but it may contain new records as well as updates for existing ones.

Jet SQL has no facility for ...

WOW!! I just learned something new and very surprising when I researched this. It turns out Jet SQL can and does handle updating and appending in the same UPDATE query. That makes the suggestion somewhat simpler :

Create an UPDATE query that specifies all fields (except any AutoNumbers - That will need to be left to the Automatic process and will be unlikely to match any previous data). A link I found on how to do this is ACC2000: How to Update and Append Records in One Update Query.

Oct 5 '11 #13

jroycestone

Awesome... that seems to work perfectly! Thank you so much for your help NeoPa!!!!!!

Oct 5 '11 #14

NeoPa

32,556

Expert Mod 16PB

Thank you for the question. I've always worked on the assumption that this facility was non-existent. It's a pretty handy thing to know about and I'm very pleased to have discovered it :-)

Oct 5 '11 #15

Similar topics

delete duplicates in list

by: christof hoeke | last post by:

hello, this must have come up before, so i am already sorry for asking but a quick googling did not give me any answer. i have a list from which i want a simpler list without the duplicates an...

Python

optimizing a query to delete duplicates

by: Alexander Anderson | last post by:

I have a DELETE statement that deletes duplicate data from a table. It takes a long time to execute, so I thought I'd seek advice here. The structure of the table is little funny. The following is...

Microsoft SQL Server

Duplicate GAC entries: Why? Is it safe to delete duplicates?

by: GlobalBruce | last post by:

The GAC on my development computer has several assemblies which are duplicated. For instance, the System assembly is present as two different native images as well as the non-native version. The...

.NET Framework

Delete Duplicates

by: EoRaptor013 | last post by:

I'm having trouble figuring out how to delete some _almost_ duplicate records in a look-up table. Here's the table: CREATE TABLE ( (16) NOT NULL , (2) NOT NULL , (20) NULL , (50) NULL...

Microsoft SQL Server

Delete duplicates with certain criteria

by: Smythe32 | last post by:

If anyone could help, I would appreciate it. I have a table as listed below. I need to check for duplicates by the OrderItem field and if there are duplicates, it then needs to keep the...

Microsoft Access / VBA

Delete duplicates array (String)

by: Mokita | last post by:

Hello, I am working with Taverna to build a workflow. Taverna has a beanshell where I can program in java. I am having some problems in writing a script, where I want to eliminate the duplicates...

Java

Need help with append and delete duplicates

by: allingame | last post by:

Need help with append and delete duplicates I have tables namely 1)emp, 2)time and 3)payroll TABLE emp ssn text U]PK name text

Oracle Database

Delete duplicates in each column, not duplicate records

by: watertraveller | last post by:

Hi all. My ultimate goal is to return two columns, where no single value appears anywhere twice. This means that not only do I want to check that nothing from column A appears in column B and...

Microsoft SQL Server

nee help with linked list to delete duplicates

by: moon24 | last post by:

Hi im working with linked list and i have to implement a function that deletes the duplicates of a number. for example if given 2 7 1 7 12 7 then the result should be 2 7 1 12 here is what I have:...

C / C++

How do I delete duplicates in a row?

by: KimmyG | last post by:

I'm just starting to use SQL and am much more experienced in Access. Here is what I do in Access Copy a table and rename the new table "copytable" also select structure only. Open "copytable"...

Microsoft SQL Server

Cloud Servers without Credit Card and Email Registration: A Simpler Way to Get on the Cloud

by: CloudSolutions | last post by:

Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...

General

Wordpress or something else?

by: Faith0G | last post by:

I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...

Content Management Systems

One-click Importing Excel Data into a*Database

by: ryjfgjl | last post by:

In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...

Microsoft Excel

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Basic Javascript concepts

by: aa123db | last post by:

Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...

Javascript

Batch import of multiple excel files into the database

by: ryjfgjl | last post by:

If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...

Data Management

Merging data from multiple Excel files

by: ryjfgjl | last post by:

In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...

Data Management

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware