Clean up duplicate data in multiple, related tables

HLCruz via AccessMonster.com

I am working with a database that has client information separated in to 4
related tables - tFolder, tAddress, tEmail, tPhone number. In addition there
are related tables tGifts and tCalls. The database has roughly 22,000
records but should only have around 6,000. The remaining records are
duplicates, but in many cases the correct data for one person is spread out
between the duplicate records and related tables. I need to be able to
delete the duplicates while maintaining the integrity of the data and the
relationships with the other tables. Any help on how to do this would be
appreciated.

--
Message posted via AccessMonster.com
http://www.accessmonster.com/Uwe/For...ccess/200610/1

Oct 2 '06 #1

Subscribe Post Reply

3277

William Hindman

http://www.fabalou.com/Access/Querie...pe_records.asp

....try starting here ...I've always found cleaning up dirty data to be the
most difficult task when taking over an existing database from a client
....not so much technically difficult but rather just a royal pita ...hth.

William Hindman

"HLCruz via AccessMonster.com" <u27207@uwewrote in message
news:672b99dabfa0b@uwe...

>I am working with a database that has client information separated in to 4
related tables - tFolder, tAddress, tEmail, tPhone number. In addition
there
are related tables tGifts and tCalls. The database has roughly 22,000
records but should only have around 6,000. The remaining records are
duplicates, but in many cases the correct data for one person is spread
out
between the duplicate records and related tables. I need to be able to
delete the duplicates while maintaining the integrity of the data and the
relationships with the other tables. Any help on how to do this would be
appreciated.

--
Message posted via AccessMonster.com
http://www.accessmonster.com/Uwe/For...ccess/200610/1

Oct 2 '06 #2

HLCruz via AccessMonster.com

Thanks William, I tried a similar approach already unfortunately not only is
my data duplicated, but the valid data is spread out between the duplicates
related tables. Deleting duplicates in the main table just lead to orphan
records in other tables or missing information. I think I'm stuck doing this
manually ...

A philosphical question: What benefit to a database design and function is
there to separate client information into multiple, related tables - i.e. -
address, phone, email. This is a small non-profit organization, most clients
have only one address or phone number and those that do have multiple
contacts it's usually just a home, cell and work phone #. This is the first
time I've come across this structure ... I appreciate any feedback.

Thanks,
Heather

William Hindman wrote:

>http://www.fabalou.com/Access/Querie...pe_records.asp

...try starting here ...I've always found cleaning up dirty data to be the
most difficult task when taking over an existing database from a client
...not so much technically difficult but rather just a royal pita ...hth.

William Hindman

>>I am working with a database that has client information separated in to 4
related tables - tFolder, tAddress, tEmail, tPhone number. In addition
[quoted text clipped - 7 lines]
>relationships with the other tables. Any help on how to do this would be
appreciated.

--
Message posted via http://www.accessmonster.com

Oct 2 '06 #3

William Hindman

....there are no rules that work for all databases ...even codd's laws are
applicable only when they make sense ...the problem of course is that your
client is a small non-profit now and the single table name, address, phones,
e addresses works "ok" ...but then it gets passed on or they grow or
something changes ...and now they need separate billing, shipping, and
street addresses in addition to a po box ...with the flat table design you
just keep on adding fields ...which of course means that everytime you pull
that record all of that data has to be filtered.

....the fully (well at least third normal) normalized db structure that I
default to uses a super entity with breakouts for addresses, phones,
e-addresses, and c/s/z/c on one side and entity structures such as
organizations and people on the other all related through the super entity
....which allows me to share the data without duplicating it ...I use it in
virtually every case because I know it inside out and have a code framework
built around it that makes it quick and easy to adapt to a wide variety of
client needs.

....but for you, in this specific instance, its way too much ...and there is
no apparent reason why you could not denrmalize the data into a simpler
relational model if that works best for you.

....helen feddema has some good examples and explanations on how to go about
denormalizing existing data http://www.helenfeddema.com/

....as for deleting duplicate records, the query approach works fine ...you
just need to make sure you get all the child tables deduped before the
parent tables ...that's where a backup covers your behind in case you make a
mistake ...once you have the exact duplicates deleted, then you can back up
and repeat the process looking for near duplicates containing related data
in different fields ...its repetitive and a royal pita ...but if you pay
attention, it works.

....hth

William Hindman

"HLCruz via AccessMonster.com" <u27207@uwewrote in message
news:672c204095fa7@uwe...

Thanks William, I tried a similar approach already unfortunately not only
is
my data duplicated, but the valid data is spread out between the
duplicates
related tables. Deleting duplicates in the main table just lead to orphan
records in other tables or missing information. I think I'm stuck doing
this
manually ...

A philosphical question: What benefit to a database design and function
is
there to separate client information into multiple, related tables -
i.e. -
address, phone, email. This is a small non-profit organization, most
clients
have only one address or phone number and those that do have multiple
contacts it's usually just a home, cell and work phone #. This is the
first
time I've come across this structure ... I appreciate any feedback.

Thanks,
Heather

William Hindman wrote:
>>http://www.fabalou.com/Access/Querie...pe_records.asp

...try starting here ...I've always found cleaning up dirty data to be the
most difficult task when taking over an existing database from a client
...not so much technically difficult but rather just a royal pita ...hth.

William Hindman

>>>I am working with a database that has client information separated in to
4
related tables - tFolder, tAddress, tEmail, tPhone number. In addition
[quoted text clipped - 7 lines]
>>relationships with the other tables. Any help on how to do this would
be
appreciated.

--
Message posted via http://www.accessmonster.com

Oct 2 '06 #4

HLCruz via AccessMonster.com

Thanks so much for your comments - I will definately have some more
converstations with the company to see if the growth is expected like you
mentioned. If not, I will probably denormalize. Either way, I now see the
motivation that was behind the structure.

As for the duplicates, I didn't think about taking the time to dedupe each
child table. That makes perfect sense and I will start today. Thanks again
for your help.

William Hindman wrote:

>...there are no rules that work for all databases ...even codd's laws are
applicable only when they make sense ...the problem of course is that your
client is a small non-profit now and the single table name, address, phones,
e addresses works "ok" ...but then it gets passed on or they grow or
something changes ...and now they need separate billing, shipping, and
street addresses in addition to a po box ...with the flat table design you
just keep on adding fields ...which of course means that everytime you pull
that record all of that data has to be filtered.

...the fully (well at least third normal) normalized db structure that I
default to uses a super entity with breakouts for addresses, phones,
e-addresses, and c/s/z/c on one side and entity structures such as
organizations and people on the other all related through the super entity
...which allows me to share the data without duplicating it ...I use it in
virtually every case because I know it inside out and have a code framework
built around it that makes it quick and easy to adapt to a wide variety of
client needs.

...but for you, in this specific instance, its way too much ...and there is
no apparent reason why you could not denrmalize the data into a simpler
relational model if that works best for you.

...helen feddema has some good examples and explanations on how to go about
denormalizing existing data http://www.helenfeddema.com/

...as for deleting duplicate records, the query approach works fine ...you
just need to make sure you get all the child tables deduped before the
parent tables ...that's where a backup covers your behind in case you make a
mistake ...once you have the exact duplicates deleted, then you can back up
and repeat the process looking for near duplicates containing related data
in different fields ...its repetitive and a royal pita ...but if you pay
attention, it works.

...hth

William Hindman

>Thanks William, I tried a similar approach already unfortunately not only
is
[quoted text clipped - 34 lines]

>>>be
appreciated.

--
Message posted via http://www.accessmonster.com

Oct 3 '06 #5

by: Catherine Jo Morgan | last post by:

Can I set it up so that a certain combination of fields can't contain the same entries, on another record? e.g. a combination of FirstName/LastName/address? Or FirstName/LastName/phone? Or...

Microsoft Access / VBA

Duplicate value in field or relationship

by: Donna Price | last post by:

I have a problem with a database which I've recently separated into several related tables. I have a main data entry form, which has several subforms on it on tabbed pages. The first subform is...

Microsoft Access / VBA

Data Adapter Configuration Wizard...

by: Rick | last post by:

The data adapter wizard allows you to add more than one table, but that doesn't seem to work right when setting up a dataset. Some of the documentation I have read states that only one table...

Visual Basic .NET

Handling data relations in Gridview

by: sutphinwb | last post by:

Hi - This could be a simple question. When I relate two tables in a datasetet, how do I get that relation to show up in a GridView? The only way I've done it, is to create a separate table in the...

ASP.NET

by: Aaron | last post by:

Hi, It is possible to establish relationships between tables within a dataset and this allows some useful features. For example, given a row from a table you can use GetParentRow,...

Visual Basic .NET

Query to delete data in multiple tables

by: jpr | last post by:

Hello, My database has 5 tables. WHen I add data to one table, it runs an append query that copies three records to other 4 tables. The main table is MASTER. The data I copy are: ID, SSN and...

Microsoft Access / VBA

ASP+Access - Need sample code for 3 table Internet/Server based Data Base

by: Will | last post by:

- I know enough ASP and Access to be dangerous :) - I need to put up a data base on our web server with 3 related tables. - They will be accessed by a limited number of people. - Each user will...

ASP / Active Server Pages

Duplicate the record in form and subform

by: claus | last post by:

Hi, I am not a programmer but tries anyway to program a feature in a form where I am able to copy entries for the form and a subform. I have tried follow the guide here...

Microsoft Access / VBA

Form to choose a duplicate value

by: Shawn Yates | last post by:

I am creating a Zip Code search tool. I have a Table with every zip code and its corresponding city and state. I have also created an input table that allows the user to input cities and states....

Microsoft Access / VBA

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Merging data from multiple Excel files

by: ryjfgjl | last post by:

In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...

Data Management

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General

Clean up duplicate data in multiple, related tables

Similar topics