By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
445,705 Members | 1,795 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 445,705 IT Pros & Developers. It's quick & easy.

Data Cleaning

P: n/a
Since I just started my new work, I have inherited a MS Access
database which has nearly 13000 records in a single table. Now, my
mandate is to clean the database and maybe split the table into two.
Can anyone please give me some ideas on how to clean duplicated
records without going through all the records.
Any help or idea will be greatly appreciated? I am not much of a tech
whiz but I am quite familiar with databases. I haven't dealt with a MS
access database with 13000 records.
Nov 13 '05 #1
Share this Question
Share on Google+
12 Replies


P: n/a
what are the fields???
---
Please immediately let us know (by phone or return email) if (a) this email
contains a virus
(b) you are not the intended recipient
(c) you consider this email to be spam.
We have done our utmost to make sure that
none of the above are applicable. THANK YOU
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.690 / Virus Database: 451 - Release Date: 22/05/2004
Nov 13 '05 #2

P: n/a
Hi there,

you could try using the DISTINCTROW option in an SQL query. This will
only select the unique rows/records from a table and will not return
duplicate rows/records. (look up Access help if you don't know how to
make a select query)

for example (using my own DB)

SELECT DISTINCTROW tblLessonDetails.ID, tblLessonDetails.Grade,
tblLessonDetails.Rank, tblLessonDetails.LessonNumber,
tblLessonDetails.LessonName, tblLessonDetails.Activity INTO tblCleaned
FROM tblLessonDetails;

This will select the unique records and copy them into a new table
using the MAKETABLE 'INTO' option. You will then end up with a new
table minus the duplicate records.

If your table contains a unique ID (as it probably does) it may be
better to select all the values in the table except for the ID field
and then use the DISTINCT option in the SQL statement instead of the
DISTINCTROW option. This will return only records with unique values
based on the fields specified in the query design grid.

i.e

SELECT DISTINCT tblLessonDetails.ID, tblLessonDetails.Grade,
tblLessonDetails.Rank, tblLessonDetails.LessonNumber,
tblLessonDetails.LessonName, tblLessonDetails.Activity INTO tblCleaned
FROM tblLessonDetails;

This is another way to do it without editing the SQL (from Access
help)

1.Open a query in Design view.
2.Select the query by clicking anywhere in query Design view outside
the design grid and the field lists.
3.Click Properties on the toolbar to display the query's property
sheet.

Do one of the following:
4.Prevent showing duplicate records in a query based on fields in the
underlying table or query
* Set the UniqueRecords property to Yes.

5.Prevent showing duplicate records in a query based on fields in the
query design grid
* Set the UniqueValues property to Yes.

remember to choose Query>Make Table Query from the main menu toolbar

hope this helps,

David Thomas.

sp*****@usc.edu (Santosh) wrote in message news:<6a**************************@posting.google. com>...
Since I just started my new work, I have inherited a MS Access
database which has nearly 13000 records in a single table. Now, my
mandate is to clean the database and maybe split the table into two.
Can anyone please give me some ideas on how to clean duplicated
records without going through all the records.
Any help or idea will be greatly appreciated? I am not much of a tech
whiz but I am quite familiar with databases. I haven't dealt with a MS
access database with 13000 records.

Nov 13 '05 #3

P: n/a
Santosh wrote:
Since I just started my new work, I have inherited a MS Access
database which has nearly 13000 records in a single table. Now, my
mandate is to clean the database and maybe split the table into two.
Can anyone please give me some ideas on how to clean duplicated
records without going through all the records.
Any help or idea will be greatly appreciated? I am not much of a tech
whiz but I am quite familiar with databases. I haven't dealt with a MS
access database with 13000 records.


Before you go messing around with the database, I recommend that you
make a backup of it. Maybe make a copy of it and play around with the
copy and experiment with it.

Most often you split a table into a 1 to many structure. For example,
you may have a table with customer name, address, city, state, zip and
then some other data. Basically you would create a record for the
customer (1 side) and the other data on the many side.

Then you create forms and reports with the two tables.

Nov 13 '05 #4

P: n/a
Salad <oi*@vinegar.com> wrote in message news:<gf******************@newsread1.news.pas.eart hlink.net>...

Before you go messing around with the database, I recommend that you
make a backup of it. Maybe make a copy of it and play around with the
copy and experiment with it.

Most often you split a table into a 1 to many structure. For example,
you may have a table with customer name, address, city, state, zip and
then some other data. Basically you would create a record for the
customer (1 side) and the other data on the many side.

Then you create forms and reports with the two tables.


Thanks a lot. I actually did the DISTINCT ROW and it worked out fine.
How can I add a search capability to ACCESS? Eventually, my co-workers
will need to find records from the databases? Any idea will be greatly
appreciated?
Nov 13 '05 #5

P: n/a
Santosh wrote:
Salad <oi*@vinegar.com> wrote in message news:<gf******************@newsread1.news.pas.eart hlink.net>...
Before you go messing around with the database, I recommend that you
make a backup of it. Maybe make a copy of it and play around with the
copy and experiment with it.

Most often you split a table into a 1 to many structure. For example,
you may have a table with customer name, address, city, state, zip and
then some other data. Basically you would create a record for the
customer (1 side) and the other data on the many side.

Then you create forms and reports with the two tables.

Thanks a lot. I actually did the DISTINCT ROW and it worked out fine.
How can I add a search capability to ACCESS? Eventually, my co-workers
will need to find records from the databases? Any idea will be greatly
appreciated?


Open up the query or form. Then check the menu bar and see if there is
a binoculars there. That's the quick and easy way.

Nov 13 '05 #6

P: n/a
From Acc 97 Help:

1 In the Database window, click the Queries tab, and then click New.
2 In the New Query dialog box, click Find Duplicates Query Wizard.
3 Click OK.
4 Follow the directions in the wizard dialog boxes. If you don't choose to
show fields in addition to those with duplicate values, the query results
will sum the instances of each duplicate value. In the last dialog box, you
can choose to run the query or see the query's structure in Design view.

Note Microsoft Access can automatically delete all the duplicate records
in a table, although you can't use the Find Duplicates Wizard to do it...

Automatically delete duplicate records from a table

In this procedure, you create a copy of the structure of the table that
contains duplicates, make primary keys of all the fields that contain
duplicates, and then run an append query from the original table to the new
table. Because fields that are primary keys can't contain duplicate records,
this procedure produces a table without duplicate records.

To create a new table

1 In the Database window, click the Tables tab.
2 Click the name of the table you want to delete duplicate records from.
3 Click Copy on the toolbar.
4 Click Paste on the toolbar.
5 In the Paste Table As dialog box, type a name for the copied table, click
Structure Only, and then click OK.
6 Open the new table in Design view, and select the field(s) that contained
duplicates in the table you copied.

7 Click Primary Key on the toolbar to create a primary key based on the
selected fields.
8 Save and close the table.

To append only unique records to the new table

1 Create a new query based on the original table containing duplicates.
2 In query Design view, click the Query Type button on the toolbar, and
then click Append Query.
3 In the Append dialog box, click the name of the new table from the Table
Name list, and then click OK.
4 Include all the fields from the original table by dragging the asterisk
(*) to the query design grid.

5 Click Run on the toolbar.
6 Click Yes when you receive the message that you're about to append rows.
7 Click Yes when you receive the message that Microsoft Access can't append
all the records in the append query. This transfers only unique records to
your new table and discards the duplicates.
8 To see the results, open the table from the Tables tab in the Database
window.

9 When you're sure the new table has the correct unique records, you can
delete the original table, and then rename the new table using the name of
the original table.

HTH
--
-Larry-
--

"Santosh" <sp*****@usc.edu> wrote in message
news:6a**************************@posting.google.c om...
Since I just started my new work, I have inherited a MS Access
database which has nearly 13000 records in a single table. Now, my
mandate is to clean the database and maybe split the table into two.
Can anyone please give me some ideas on how to clean duplicated
records without going through all the records.
Any help or idea will be greatly appreciated? I am not much of a tech
whiz but I am quite familiar with databases. I haven't dealt with a MS
access database with 13000 records.

Nov 13 '05 #7

P: n/a
Salad <oi*@vinegar.com> wrote in message news:<Ph*****************@newsread2.news.pas.earth link.net>...
Santosh wrote:

How can I add a search capability to ACCESS? Eventually, my co-workers
will need to find records from the databases? Any idea will be greatly
appreciated?


Using Find or Filter in a form is good if there aren't too many
records, but my database is approaching 20,000 records and these
functions are slowing down. I'm working on a Query By Form record
finder. I made an unbound form in which my users enter criteria, then
it builds a query in code (querydef object) using the user's entry as
query criteria, and switches to a results form for which the query is
the source. I learned this approach from the sample queries at
Microsoft, and from a coding book "Beginning Access 2002 VBA" by Smith
and Sussman. The biggest snag I ran into was getting it to switch to
the results form and show the selected data, I ended up coding it to
close and re-open the results form each time, because the requery and
refresh functions don't work as advertised. Luckily this approach
worked better than I thought it would. It's almost done and it's much
faster than filter or find in the main form. I and one other user are
using it and we're very happy with the results.

There are many different approaches you can take depending on the
needs of your users. The examples I saw opened the query in datasheet
view instead of opening the data in a form. The example in the book
had check boxes and combo boxes for criteria selection, I thought
these were very cumbersome for our situation. I don't have enough
experience to generate a lot of ideas for you, but poke around on this
site and in books, and check out the MS examples. You might search
under "recordset", "filter", "temporary table" and "query by form" for
ideas.
Nov 13 '05 #8

P: n/a
Julia Baresch wrote:
Salad <oi*@vinegar.com> wrote in message news:<Ph*****************@newsread2.news.pas.earth link.net>...
Santosh wrote:
How can I add a search capability to ACCESS? Eventually, my co-workers
will need to find records from the databases? Any idea will be greatly
appreciated?

Using Find or Filter in a form is good if there aren't too many
records, but my database is approaching 20,000 records and these
functions are slowing down. I'm working on a Query By Form record
finder. I made an unbound form in which my users enter criteria, then
it builds a query in code (querydef object) using the user's entry as
query criteria, and switches to a results form for which the query is
the source. I learned this approach from the sample queries at
Microsoft, and from a coding book "Beginning Access 2002 VBA" by Smith
and Sussman. The biggest snag I ran into was getting it to switch to
the results form and show the selected data, I ended up coding it to
close and re-open the results form each time, because the requery and
refresh functions don't work as advertised. Luckily this approach
worked better than I thought it would. It's almost done and it's much
faster than filter or find in the main form. I and one other user are
using it and we're very happy with the results.


I sometimes use a form to filter records. Normally you are not going to
filter on each specific field on a form. I have 2 command buttons when
the op has selected the fields to filter on and the values. If the op
selects cancel, no filter takes place and the filter form simply closes.
If the op selectes the "Filter It" command button, I build a filter
string in the code behind the command button. Let's say we have 2
forms; MF = Main form, FF = FilterForm. The op presses the Filter
button on MF. THe code may be something like
Docmd.OpenForm "FF"

The FF form opens up and the op fills in the data to be filtered. When
the FilterIt button is pressed in FF, I scan the fields. Ex:

Sub CommandFilterIt_Click()
'I am filtering on 3 fields; Name, DateField, and Number
Dim strF As String
If Not IsNull(Me.Name) Then
strF = "Name = '" & Me.Name & "' And "
Endif
If Not IsNull(Me.Date) Then
strF = strF & "DateField = #" Me.Date & "# And "
ENdif
If Me.Number > 0 Then
strF = strF & "Number = " Me.Number & " And "
Endif

'remove the word And at the end of the filter line if it exists
If strF > "" Then strF = Left(strF,Len(strF)-5)

'now I set the filter on the main form
Forms!MF.Form.Filter = strF

'and I turn the filter on or off depending on if
'the filter string has some information
Forms!MF.Form.FilterOn = (strF > "")

'and I close the filter form
Docmd.Close acForm, "FF"
End Sub

Setting the filter in FF for MF would be preferable, in my opinion, then
closing the form and reopening it with a new query/recordsource.

There are many different approaches you can take depending on the
needs of your users. The examples I saw opened the query in datasheet
view instead of opening the data in a form. The example in the book
had check boxes and combo boxes for criteria selection, I thought
these were very cumbersome for our situation. I don't have enough
experience to generate a lot of ideas for you, but poke around on this
site and in books, and check out the MS examples. You might search
under "recordset", "filter", "temporary table" and "query by form" for
ideas.


Nov 13 '05 #9

P: n/a
Thanks Larry, it was of great help. I removed all duplicates and there
were tons of it.
I am looking a develop a more advanced search capability for the
database, specifying certain criteria like name, and other. The
binocular works great but I want to develop a more robust search
engine that takes in more criteria thanj just last name, I want a
combination of criterias. Please, can you shed some light on it.
Thanks,

SP

"Larry Daugherty" <La********************@verizon.net> wrote in message news:<kE******************@nwrddc01.gnilink.net>.. .
From Acc 97 Help:

1 In the Database window, click the Queries tab, and then click New.
2 In the New Query dialog box, click Find Duplicates Query Wizard.
3 Click OK.
4 Follow the directions in the wizard dialog boxes. If you don't choose to
show fields in addition to those with duplicate values, the query results
will sum the instances of each duplicate value. In the last dialog box, you
can choose to run the query or see the query's structure in Design view.

Note Microsoft Access can automatically delete all the duplicate records
in a table, although you can't use the Find Duplicates Wizard to do it...

Automatically delete duplicate records from a table

In this procedure, you create a copy of the structure of the table that
contains duplicates, make primary keys of all the fields that contain
duplicates, and then run an append query from the original table to the new
table. Because fields that are primary keys can't contain duplicate records,
this procedure produces a table without duplicate records.

To create a new table

1 In the Database window, click the Tables tab.
2 Click the name of the table you want to delete duplicate records from.
3 Click Copy on the toolbar.
4 Click Paste on the toolbar.
5 In the Paste Table As dialog box, type a name for the copied table, click
Structure Only, and then click OK.
6 Open the new table in Design view, and select the field(s) that contained
duplicates in the table you copied.

7 Click Primary Key on the toolbar to create a primary key based on the
selected fields.
8 Save and close the table.

To append only unique records to the new table

1 Create a new query based on the original table containing duplicates.
2 In query Design view, click the Query Type button on the toolbar, and
then click Append Query.
3 In the Append dialog box, click the name of the new table from the Table
Name list, and then click OK.
4 Include all the fields from the original table by dragging the asterisk
(*) to the query design grid.

5 Click Run on the toolbar.
6 Click Yes when you receive the message that you're about to append rows.
7 Click Yes when you receive the message that Microsoft Access can't append
all the records in the append query. This transfers only unique records to
your new table and discards the duplicates.
8 To see the results, open the table from the Tables tab in the Database
window.

9 When you're sure the new table has the correct unique records, you can
delete the original table, and then rename the new table using the name of
the original table.

HTH
--
-Larry-
--

"Santosh" <sp*****@usc.edu> wrote in message
news:6a**************************@posting.google.c om...
Since I just started my new work, I have inherited a MS Access
database which has nearly 13000 records in a single table. Now, my
mandate is to clean the database and maybe split the table into two.
Can anyone please give me some ideas on how to clean duplicated
records without going through all the records.
Any help or idea will be greatly appreciated? I am not much of a tech
whiz but I am quite familiar with databases. I haven't dealt with a MS
access database with 13000 records.

Nov 13 '05 #10

P: n/a
You could make Select Queries for columns. With a Select Query a dialog box
comes up and you type the name you want and the file or files come up. I use
this all the time, as it is vital once the number of records become large.
A tip is to make the search for the name plus anything else that is on the
record. An example is *John*, and you will get John plus anything else,
like John Smith, John Jones, and Littlejohn etc. The * is a wildcard that
means anything else can come before or after.
"Salad" <oi*@vinegar.com> wrote in message
news:U5*****************@newsread2.news.pas.earthl ink.net...
Julia Baresch wrote:
Salad <oi*@vinegar.com> wrote in message news:<Ph*****************@newsread2.news.pas.earth link.net>...
Santosh wrote:


How can I add a search capability to ACCESS? Eventually, my co-workers
will need to find records from the databases? Any idea will be greatly
appreciated?

Using Find or Filter in a form is good if there aren't too many
records, but my database is approaching 20,000 records and these
functions are slowing down. I'm working on a Query By Form record
finder. I made an unbound form in which my users enter criteria, then
it builds a query in code (querydef object) using the user's entry as
query criteria, and switches to a results form for which the query is
the source. I learned this approach from the sample queries at
Microsoft, and from a coding book "Beginning Access 2002 VBA" by Smith
and Sussman. The biggest snag I ran into was getting it to switch to
the results form and show the selected data, I ended up coding it to
close and re-open the results form each time, because the requery and
refresh functions don't work as advertised. Luckily this approach
worked better than I thought it would. It's almost done and it's much
faster than filter or find in the main form. I and one other user are
using it and we're very happy with the results.


I sometimes use a form to filter records. Normally you are not going to
filter on each specific field on a form. I have 2 command buttons when
the op has selected the fields to filter on and the values. If the op
selects cancel, no filter takes place and the filter form simply closes.
If the op selectes the "Filter It" command button, I build a filter
string in the code behind the command button. Let's say we have 2
forms; MF = Main form, FF = FilterForm. The op presses the Filter
button on MF. THe code may be something like
Docmd.OpenForm "FF"

The FF form opens up and the op fills in the data to be filtered. When
the FilterIt button is pressed in FF, I scan the fields. Ex:

Sub CommandFilterIt_Click()
'I am filtering on 3 fields; Name, DateField, and Number
Dim strF As String
If Not IsNull(Me.Name) Then
strF = "Name = '" & Me.Name & "' And "
Endif
If Not IsNull(Me.Date) Then
strF = strF & "DateField = #" Me.Date & "# And "
ENdif
If Me.Number > 0 Then
strF = strF & "Number = " Me.Number & " And "
Endif

'remove the word And at the end of the filter line if it exists
If strF > "" Then strF = Left(strF,Len(strF)-5)

'now I set the filter on the main form
Forms!MF.Form.Filter = strF

'and I turn the filter on or off depending on if
'the filter string has some information
Forms!MF.Form.FilterOn = (strF > "")

'and I close the filter form
Docmd.Close acForm, "FF"
End Sub

Setting the filter in FF for MF would be preferable, in my opinion, then
closing the form and reopening it with a new query/recordsource.

There are many different approaches you can take depending on the
needs of your users. The examples I saw opened the query in datasheet
view instead of opening the data in a form. The example in the book
had check boxes and combo boxes for criteria selection, I thought
these were very cumbersome for our situation. I don't have enough
experience to generate a lot of ideas for you, but poke around on this
site and in books, and check out the MS examples. You might search
under "recordset", "filter", "temporary table" and "query by form" for
ideas.

Nov 13 '05 #11

P: n/a
Hey Jim

Could you explain a bit more about the concept discussed below? How exactly
do you mean this?

Thank you

Nicolaas
----------------------

"Jim Walker" <wa*******@att.net.net> wrote in message
news:DZ*********************@bgtnsc04-news.ops.worldnet.att.net...
You could make Select Queries for columns. With a Select Query a dialog box comes up and you type the name you want and the file or files come up. I use this all the time, as it is vital once the number of records become large.
A tip is to make the search for the name plus anything else that is on the
record. An example is *John*, and you will get John plus anything else,
like John Smith, John Jones, and Littlejohn etc. The * is a wildcard that
means anything else can come before or after.
"Salad" <oi*@vinegar.com> wrote in message
news:U5*****************@newsread2.news.pas.earthl ink.net...
Julia Baresch wrote:
Salad <oi*@vinegar.com> wrote in message news:<Ph*****************@newsread2.news.pas.earth link.net>...
>Santosh wrote:
>>How can I add a search capability to ACCESS? Eventually, my co-workers>>will need to find records from the databases? Any idea will be greatly>>appreciated?
Using Find or Filter in a form is good if there aren't too many
records, but my database is approaching 20,000 records and these
functions are slowing down. I'm working on a Query By Form record
finder. I made an unbound form in which my users enter criteria, then
it builds a query in code (querydef object) using the user's entry as
query criteria, and switches to a results form for which the query is
the source. I learned this approach from the sample queries at
Microsoft, and from a coding book "Beginning Access 2002 VBA" by Smith
and Sussman. The biggest snag I ran into was getting it to switch to
the results form and show the selected data, I ended up coding it to
close and re-open the results form each time, because the requery and
refresh functions don't work as advertised. Luckily this approach
worked better than I thought it would. It's almost done and it's much
faster than filter or find in the main form. I and one other user are
using it and we're very happy with the results.


I sometimes use a form to filter records. Normally you are not going to
filter on each specific field on a form. I have 2 command buttons when
the op has selected the fields to filter on and the values. If the op
selects cancel, no filter takes place and the filter form simply closes.
If the op selectes the "Filter It" command button, I build a filter
string in the code behind the command button. Let's say we have 2
forms; MF = Main form, FF = FilterForm. The op presses the Filter
button on MF. THe code may be something like
Docmd.OpenForm "FF"

The FF form opens up and the op fills in the data to be filtered. When
the FilterIt button is pressed in FF, I scan the fields. Ex:

Sub CommandFilterIt_Click()
'I am filtering on 3 fields; Name, DateField, and Number
Dim strF As String
If Not IsNull(Me.Name) Then
strF = "Name = '" & Me.Name & "' And "
Endif
If Not IsNull(Me.Date) Then
strF = strF & "DateField = #" Me.Date & "# And "
ENdif
If Me.Number > 0 Then
strF = strF & "Number = " Me.Number & " And "
Endif

'remove the word And at the end of the filter line if it exists
If strF > "" Then strF = Left(strF,Len(strF)-5)

'now I set the filter on the main form
Forms!MF.Form.Filter = strF

'and I turn the filter on or off depending on if
'the filter string has some information
Forms!MF.Form.FilterOn = (strF > "")

'and I close the filter form
Docmd.Close acForm, "FF"
End Sub

Setting the filter in FF for MF would be preferable, in my opinion, then
closing the form and reopening it with a new query/recordsource.

There are many different approaches you can take depending on the
needs of your users. The examples I saw opened the query in datasheet
view instead of opening the data in a form. The example in the book
had check boxes and combo boxes for criteria selection, I thought
these were very cumbersome for our situation. I don't have enough
experience to generate a lot of ideas for you, but poke around on this
site and in books, and check out the MS examples. You might search
under "recordset", "filter", "temporary table" and "query by form" for
ideas.


---
Please immediately let us know (by phone or return email) if (a) this email
contains a virus
(b) you are not the intended recipient
(c) you consider this email to be spam.
We have done our utmost to make sure that
none of the above are applicable. THANK YOU
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.692 / Virus Database: 453 - Release Date: 28/05/2004
Nov 13 '05 #12

P: n/a
sp*****@usc.edu (Santosh) wrote in message news:<6a**************************@posting.google. com>...
Thanks Larry, it was of great help. I removed all duplicates and there
were tons of it.
I am looking a develop a more advanced search capability for the
database, specifying certain criteria like name, and other. The
binocular works great but I want to develop a more robust search
engine that takes in more criteria thanj just last name, I want a
combination of criterias. Please, can you shed some light on it.
Thanks,

SP
The approach I described earlier, Query by Form, is good for this
because you can put any fields you want in the criteria form, and the
user can enter any combination for the search. I have First and Last
Names, Member Number, SS #, City, State, Zip Code, and Telephone
number. Sometimes we get mail with incomplete information and have to
find the record by Last Name and City, or something like that.
From: Salad (oi*@vinegar.com)
Subject: Re: Data Cleaning
Setting the filter in FF for MF would be preferable, in my opinion,

then
<closing the form and reopening it with a new query/recordsource.

Maybe, but you're still using the Filter, which was never fast and is
too slow to use with a lot of records. We were using it all along in
Filter by Form. It's the slowest aspect of the database. We're
finding Query by Form is much faster, not just to get the results, but
to advance through the records.

Julia
Nov 13 '05 #13

This discussion thread is closed

Replies have been disabled for this discussion.