473,770 Members | 2,217 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Data Cleaning

Since I just started my new work, I have inherited a MS Access
database which has nearly 13000 records in a single table. Now, my
mandate is to clean the database and maybe split the table into two.
Can anyone please give me some ideas on how to clean duplicated
records without going through all the records.
Any help or idea will be greatly appreciated? I am not much of a tech
whiz but I am quite familiar with databases. I haven't dealt with a MS
access database with 13000 records.
Nov 13 '05 #1
12 9867
what are the fields???
---
Please immediately let us know (by phone or return email) if (a) this email
contains a virus
(b) you are not the intended recipient
(c) you consider this email to be spam.
We have done our utmost to make sure that
none of the above are applicable. THANK YOU
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.690 / Virus Database: 451 - Release Date: 22/05/2004
Nov 13 '05 #2
Hi there,

you could try using the DISTINCTROW option in an SQL query. This will
only select the unique rows/records from a table and will not return
duplicate rows/records. (look up Access help if you don't know how to
make a select query)

for example (using my own DB)

SELECT DISTINCTROW tblLessonDetail s.ID, tblLessonDetail s.Grade,
tblLessonDetail s.Rank, tblLessonDetail s.LessonNumber,
tblLessonDetail s.LessonName, tblLessonDetail s.Activity INTO tblCleaned
FROM tblLessonDetail s;

This will select the unique records and copy them into a new table
using the MAKETABLE 'INTO' option. You will then end up with a new
table minus the duplicate records.

If your table contains a unique ID (as it probably does) it may be
better to select all the values in the table except for the ID field
and then use the DISTINCT option in the SQL statement instead of the
DISTINCTROW option. This will return only records with unique values
based on the fields specified in the query design grid.

i.e

SELECT DISTINCT tblLessonDetail s.ID, tblLessonDetail s.Grade,
tblLessonDetail s.Rank, tblLessonDetail s.LessonNumber,
tblLessonDetail s.LessonName, tblLessonDetail s.Activity INTO tblCleaned
FROM tblLessonDetail s;

This is another way to do it without editing the SQL (from Access
help)

1.Open a query in Design view.
2.Select the query by clicking anywhere in query Design view outside
the design grid and the field lists.
3.Click Properties on the toolbar to display the query's property
sheet.

Do one of the following:
4.Prevent showing duplicate records in a query based on fields in the
underlying table or query
* Set the UniqueRecords property to Yes.

5.Prevent showing duplicate records in a query based on fields in the
query design grid
* Set the UniqueValues property to Yes.

remember to choose Query>Make Table Query from the main menu toolbar

hope this helps,

David Thomas.

sp*****@usc.edu (Santosh) wrote in message news:<6a******* *************** ****@posting.go ogle.com>...
Since I just started my new work, I have inherited a MS Access
database which has nearly 13000 records in a single table. Now, my
mandate is to clean the database and maybe split the table into two.
Can anyone please give me some ideas on how to clean duplicated
records without going through all the records.
Any help or idea will be greatly appreciated? I am not much of a tech
whiz but I am quite familiar with databases. I haven't dealt with a MS
access database with 13000 records.

Nov 13 '05 #3
Santosh wrote:
Since I just started my new work, I have inherited a MS Access
database which has nearly 13000 records in a single table. Now, my
mandate is to clean the database and maybe split the table into two.
Can anyone please give me some ideas on how to clean duplicated
records without going through all the records.
Any help or idea will be greatly appreciated? I am not much of a tech
whiz but I am quite familiar with databases. I haven't dealt with a MS
access database with 13000 records.


Before you go messing around with the database, I recommend that you
make a backup of it. Maybe make a copy of it and play around with the
copy and experiment with it.

Most often you split a table into a 1 to many structure. For example,
you may have a table with customer name, address, city, state, zip and
then some other data. Basically you would create a record for the
customer (1 side) and the other data on the many side.

Then you create forms and reports with the two tables.

Nov 13 '05 #4
Salad <oi*@vinegar.co m> wrote in message news:<gf******* ***********@new sread1.news.pas .earthlink.net> ...

Before you go messing around with the database, I recommend that you
make a backup of it. Maybe make a copy of it and play around with the
copy and experiment with it.

Most often you split a table into a 1 to many structure. For example,
you may have a table with customer name, address, city, state, zip and
then some other data. Basically you would create a record for the
customer (1 side) and the other data on the many side.

Then you create forms and reports with the two tables.


Thanks a lot. I actually did the DISTINCT ROW and it worked out fine.
How can I add a search capability to ACCESS? Eventually, my co-workers
will need to find records from the databases? Any idea will be greatly
appreciated?
Nov 13 '05 #5
Santosh wrote:
Salad <oi*@vinegar.co m> wrote in message news:<gf******* ***********@new sread1.news.pas .earthlink.net> ...
Before you go messing around with the database, I recommend that you
make a backup of it. Maybe make a copy of it and play around with the
copy and experiment with it.

Most often you split a table into a 1 to many structure. For example,
you may have a table with customer name, address, city, state, zip and
then some other data. Basically you would create a record for the
customer (1 side) and the other data on the many side.

Then you create forms and reports with the two tables.

Thanks a lot. I actually did the DISTINCT ROW and it worked out fine.
How can I add a search capability to ACCESS? Eventually, my co-workers
will need to find records from the databases? Any idea will be greatly
appreciated?


Open up the query or form. Then check the menu bar and see if there is
a binoculars there. That's the quick and easy way.

Nov 13 '05 #6
From Acc 97 Help:

1 In the Database window, click the Queries tab, and then click New.
2 In the New Query dialog box, click Find Duplicates Query Wizard.
3 Click OK.
4 Follow the directions in the wizard dialog boxes. If you don't choose to
show fields in addition to those with duplicate values, the query results
will sum the instances of each duplicate value. In the last dialog box, you
can choose to run the query or see the query's structure in Design view.

Note Microsoft Access can automatically delete all the duplicate records
in a table, although you can't use the Find Duplicates Wizard to do it...

Automatically delete duplicate records from a table

In this procedure, you create a copy of the structure of the table that
contains duplicates, make primary keys of all the fields that contain
duplicates, and then run an append query from the original table to the new
table. Because fields that are primary keys can't contain duplicate records,
this procedure produces a table without duplicate records.

To create a new table

1 In the Database window, click the Tables tab.
2 Click the name of the table you want to delete duplicate records from.
3 Click Copy on the toolbar.
4 Click Paste on the toolbar.
5 In the Paste Table As dialog box, type a name for the copied table, click
Structure Only, and then click OK.
6 Open the new table in Design view, and select the field(s) that contained
duplicates in the table you copied.

7 Click Primary Key on the toolbar to create a primary key based on the
selected fields.
8 Save and close the table.

To append only unique records to the new table

1 Create a new query based on the original table containing duplicates.
2 In query Design view, click the Query Type button on the toolbar, and
then click Append Query.
3 In the Append dialog box, click the name of the new table from the Table
Name list, and then click OK.
4 Include all the fields from the original table by dragging the asterisk
(*) to the query design grid.

5 Click Run on the toolbar.
6 Click Yes when you receive the message that you're about to append rows.
7 Click Yes when you receive the message that Microsoft Access can't append
all the records in the append query. This transfers only unique records to
your new table and discards the duplicates.
8 To see the results, open the table from the Tables tab in the Database
window.

9 When you're sure the new table has the correct unique records, you can
delete the original table, and then rename the new table using the name of
the original table.

HTH
--
-Larry-
--

"Santosh" <sp*****@usc.ed u> wrote in message
news:6a******** *************** ***@posting.goo gle.com...
Since I just started my new work, I have inherited a MS Access
database which has nearly 13000 records in a single table. Now, my
mandate is to clean the database and maybe split the table into two.
Can anyone please give me some ideas on how to clean duplicated
records without going through all the records.
Any help or idea will be greatly appreciated? I am not much of a tech
whiz but I am quite familiar with databases. I haven't dealt with a MS
access database with 13000 records.

Nov 13 '05 #7
Salad <oi*@vinegar.co m> wrote in message news:<Ph******* **********@news read2.news.pas. earthlink.net>. ..
Santosh wrote:

How can I add a search capability to ACCESS? Eventually, my co-workers
will need to find records from the databases? Any idea will be greatly
appreciated?


Using Find or Filter in a form is good if there aren't too many
records, but my database is approaching 20,000 records and these
functions are slowing down. I'm working on a Query By Form record
finder. I made an unbound form in which my users enter criteria, then
it builds a query in code (querydef object) using the user's entry as
query criteria, and switches to a results form for which the query is
the source. I learned this approach from the sample queries at
Microsoft, and from a coding book "Beginning Access 2002 VBA" by Smith
and Sussman. The biggest snag I ran into was getting it to switch to
the results form and show the selected data, I ended up coding it to
close and re-open the results form each time, because the requery and
refresh functions don't work as advertised. Luckily this approach
worked better than I thought it would. It's almost done and it's much
faster than filter or find in the main form. I and one other user are
using it and we're very happy with the results.

There are many different approaches you can take depending on the
needs of your users. The examples I saw opened the query in datasheet
view instead of opening the data in a form. The example in the book
had check boxes and combo boxes for criteria selection, I thought
these were very cumbersome for our situation. I don't have enough
experience to generate a lot of ideas for you, but poke around on this
site and in books, and check out the MS examples. You might search
under "recordset" , "filter", "temporary table" and "query by form" for
ideas.
Nov 13 '05 #8
Julia Baresch wrote:
Salad <oi*@vinegar.co m> wrote in message news:<Ph******* **********@news read2.news.pas. earthlink.net>. ..
Santosh wrote:
How can I add a search capability to ACCESS? Eventually, my co-workers
will need to find records from the databases? Any idea will be greatly
appreciate d?

Using Find or Filter in a form is good if there aren't too many
records, but my database is approaching 20,000 records and these
functions are slowing down. I'm working on a Query By Form record
finder. I made an unbound form in which my users enter criteria, then
it builds a query in code (querydef object) using the user's entry as
query criteria, and switches to a results form for which the query is
the source. I learned this approach from the sample queries at
Microsoft, and from a coding book "Beginning Access 2002 VBA" by Smith
and Sussman. The biggest snag I ran into was getting it to switch to
the results form and show the selected data, I ended up coding it to
close and re-open the results form each time, because the requery and
refresh functions don't work as advertised. Luckily this approach
worked better than I thought it would. It's almost done and it's much
faster than filter or find in the main form. I and one other user are
using it and we're very happy with the results.


I sometimes use a form to filter records. Normally you are not going to
filter on each specific field on a form. I have 2 command buttons when
the op has selected the fields to filter on and the values. If the op
selects cancel, no filter takes place and the filter form simply closes.
If the op selectes the "Filter It" command button, I build a filter
string in the code behind the command button. Let's say we have 2
forms; MF = Main form, FF = FilterForm. The op presses the Filter
button on MF. THe code may be something like
Docmd.OpenForm "FF"

The FF form opens up and the op fills in the data to be filtered. When
the FilterIt button is pressed in FF, I scan the fields. Ex:

Sub CommandFilterIt _Click()
'I am filtering on 3 fields; Name, DateField, and Number
Dim strF As String
If Not IsNull(Me.Name) Then
strF = "Name = '" & Me.Name & "' And "
Endif
If Not IsNull(Me.Date) Then
strF = strF & "DateField = #" Me.Date & "# And "
ENdif
If Me.Number > 0 Then
strF = strF & "Number = " Me.Number & " And "
Endif

'remove the word And at the end of the filter line if it exists
If strF > "" Then strF = Left(strF,Len(s trF)-5)

'now I set the filter on the main form
Forms!MF.Form.F ilter = strF

'and I turn the filter on or off depending on if
'the filter string has some information
Forms!MF.Form.F ilterOn = (strF > "")

'and I close the filter form
Docmd.Close acForm, "FF"
End Sub

Setting the filter in FF for MF would be preferable, in my opinion, then
closing the form and reopening it with a new query/recordsource.

There are many different approaches you can take depending on the
needs of your users. The examples I saw opened the query in datasheet
view instead of opening the data in a form. The example in the book
had check boxes and combo boxes for criteria selection, I thought
these were very cumbersome for our situation. I don't have enough
experience to generate a lot of ideas for you, but poke around on this
site and in books, and check out the MS examples. You might search
under "recordset" , "filter", "temporary table" and "query by form" for
ideas.


Nov 13 '05 #9
Thanks Larry, it was of great help. I removed all duplicates and there
were tons of it.
I am looking a develop a more advanced search capability for the
database, specifying certain criteria like name, and other. The
binocular works great but I want to develop a more robust search
engine that takes in more criteria thanj just last name, I want a
combination of criterias. Please, can you shed some light on it.
Thanks,

SP

"Larry Daugherty" <La************ ********@verizo n.net> wrote in message news:<kE******* ***********@nwr ddc01.gnilink.n et>...
From Acc 97 Help:

1 In the Database window, click the Queries tab, and then click New.
2 In the New Query dialog box, click Find Duplicates Query Wizard.
3 Click OK.
4 Follow the directions in the wizard dialog boxes. If you don't choose to
show fields in addition to those with duplicate values, the query results
will sum the instances of each duplicate value. In the last dialog box, you
can choose to run the query or see the query's structure in Design view.

Note Microsoft Access can automatically delete all the duplicate records
in a table, although you can't use the Find Duplicates Wizard to do it...

Automatically delete duplicate records from a table

In this procedure, you create a copy of the structure of the table that
contains duplicates, make primary keys of all the fields that contain
duplicates, and then run an append query from the original table to the new
table. Because fields that are primary keys can't contain duplicate records,
this procedure produces a table without duplicate records.

To create a new table

1 In the Database window, click the Tables tab.
2 Click the name of the table you want to delete duplicate records from.
3 Click Copy on the toolbar.
4 Click Paste on the toolbar.
5 In the Paste Table As dialog box, type a name for the copied table, click
Structure Only, and then click OK.
6 Open the new table in Design view, and select the field(s) that contained
duplicates in the table you copied.

7 Click Primary Key on the toolbar to create a primary key based on the
selected fields.
8 Save and close the table.

To append only unique records to the new table

1 Create a new query based on the original table containing duplicates.
2 In query Design view, click the Query Type button on the toolbar, and
then click Append Query.
3 In the Append dialog box, click the name of the new table from the Table
Name list, and then click OK.
4 Include all the fields from the original table by dragging the asterisk
(*) to the query design grid.

5 Click Run on the toolbar.
6 Click Yes when you receive the message that you're about to append rows.
7 Click Yes when you receive the message that Microsoft Access can't append
all the records in the append query. This transfers only unique records to
your new table and discards the duplicates.
8 To see the results, open the table from the Tables tab in the Database
window.

9 When you're sure the new table has the correct unique records, you can
delete the original table, and then rename the new table using the name of
the original table.

HTH
--
-Larry-
--

"Santosh" <sp*****@usc.ed u> wrote in message
news:6a******** *************** ***@posting.goo gle.com...
Since I just started my new work, I have inherited a MS Access
database which has nearly 13000 records in a single table. Now, my
mandate is to clean the database and maybe split the table into two.
Can anyone please give me some ideas on how to clean duplicated
records without going through all the records.
Any help or idea will be greatly appreciated? I am not much of a tech
whiz but I am quite familiar with databases. I haven't dealt with a MS
access database with 13000 records.

Nov 13 '05 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
1252
by: Ellen K. | last post by:
What tools has everyone used for cleaning name and address data (including identifying not-immediately-obvious duplicates) in connection with a CRM project or the Customer dimension of a data warehouse? What did you like/dislike about the tool you used? How customizable was the tool you used?
1
1535
by: Matej Cepl | last post by:
Hi, can anybody help me with the cleaning of really messy HTML from the news site into really clean XHTML, which I would like to then analyze with some qualitative analysis (probably exporting to plain ASCII in meantime, but not necessarily). I can do some little cleaning by hand, but when there some hundreds of webpages, I hoped that I could create some XSL stylesheet for conversion.
7
1949
by: Luc Tremblay | last post by:
Given the typical following code: void Listener::HandleEvent(const Event& event) { // handling code } In a "clean" fashion, how is it possible to add custom data (to be subsequently accessed) to the Event instance? By custom data i mean practically anything, from a class to a single int. Particularly to my case,
4
3166
by: teddysnips | last post by:
This is a rather abstract question about data design, but I ask it here because a) the database is SQL Server, and b) you're such a learned bunch! Let's assume the classic relation of Customers and Orders, where an Order may reference a single Customer. If I was designing such a relation from scratch, I would create the Customer table with an Identity column and call it CustomerID. The Order table would contain a column called...
3
2296
by: Pierre Saint-Jacques | last post by:
DB2 V8.2 has a new envir. var. DB2_USE_ALTERNATE_PAGE_CLEANING=YES The docs. mention that this will make DB2 ignore chngpgs_thresh and use softmax to even the rate of writing out of the bp's. What agent process then takes over. On a workload of 10000 trans./min., I get over 9000 bp's writes out of my snapshot. However, LSAN Gap Cleaners, Thereshold Cleansers and Victim Page Cleaners are all = 0
1
2111
by: Mikey | last post by:
Do NativeWindows get destroyed when the app shuts down? While my app is running I can Spy++ and see my NativeWindow handle. After app shutdown it's no longer there. But this is confusing because if I Spy++ on the messages, and do a DestroyHandle, I see a WM_DESTROY, but do not see this message when the app shuts down. I presume the window has indeed been destroyed, because Spy++ doesn't pick it up.
18
2757
by: Joel Hedlund | last post by:
Hi! The question of type checking/enforcing has bothered me for a while, and since this newsgroup has a wealth of competence subscribed to it, I figured this would be a great way of learning from the experts. I feel there's a tradeoff between clear, easily readdable and extensible code on one side, and safe code providing early errors and useful tracebacks on the other. I want both! How do you guys do it? What's the pythonic way? Are...
4
3310
by: HLCruz via AccessMonster.com | last post by:
I am working with a database that has client information separated in to 4 related tables - tFolder, tAddress, tEmail, tPhone number. In addition there are related tables tGifts and tCalls. The database has roughly 22,000 records but should only have around 6,000. The remaining records are duplicates, but in many cases the correct data for one person is spread out between the duplicate records and related tables. I need to be able to...
0
1206
by: Now You Know | last post by:
Carpet Cleaners Los Angeles Home Carpet Rug Upholstery Cleaning Phone 1 310 925 1720 OR 1-818-386-1022 Local Call California Wide We offer carpet cleaning services such as; Steam Cleaning, Dry Cleaning, Fabric Lounge Suite Cleaning, Leather Lounge Suite Cleaning, Tile & Grout Cleaning, Mattress Cleaning, Wet Carpet / Water Damage Restoration for: offices, homes, restaurants, clubs and hotels http://carpetcleanersorangecounty.blogspot.com/...
0
9454
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
1
10038
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
9910
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
8933
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
7460
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
6712
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5354
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
5482
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
2
3609
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.