473,396 Members | 1,938 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,396 software developers and data experts.

Clean up duplicate data in multiple, related tables

I am working with a database that has client information separated in to 4
related tables - tFolder, tAddress, tEmail, tPhone number. In addition there
are related tables tGifts and tCalls. The database has roughly 22,000
records but should only have around 6,000. The remaining records are
duplicates, but in many cases the correct data for one person is spread out
between the duplicate records and related tables. I need to be able to
delete the duplicates while maintaining the integrity of the data and the
relationships with the other tables. Any help on how to do this would be
appreciated.

--
Message posted via AccessMonster.com
http://www.accessmonster.com/Uwe/For...ccess/200610/1

Oct 2 '06 #1
4 3277
http://www.fabalou.com/Access/Querie...pe_records.asp

....try starting here ...I've always found cleaning up dirty data to be the
most difficult task when taking over an existing database from a client
....not so much technically difficult but rather just a royal pita ...hth.

William Hindman

"HLCruz via AccessMonster.com" <u27207@uwewrote in message
news:672b99dabfa0b@uwe...
>I am working with a database that has client information separated in to 4
related tables - tFolder, tAddress, tEmail, tPhone number. In addition
there
are related tables tGifts and tCalls. The database has roughly 22,000
records but should only have around 6,000. The remaining records are
duplicates, but in many cases the correct data for one person is spread
out
between the duplicate records and related tables. I need to be able to
delete the duplicates while maintaining the integrity of the data and the
relationships with the other tables. Any help on how to do this would be
appreciated.

--
Message posted via AccessMonster.com
http://www.accessmonster.com/Uwe/For...ccess/200610/1

Oct 2 '06 #2
Thanks William, I tried a similar approach already unfortunately not only is
my data duplicated, but the valid data is spread out between the duplicates
related tables. Deleting duplicates in the main table just lead to orphan
records in other tables or missing information. I think I'm stuck doing this
manually ...

A philosphical question: What benefit to a database design and function is
there to separate client information into multiple, related tables - i.e. -
address, phone, email. This is a small non-profit organization, most clients
have only one address or phone number and those that do have multiple
contacts it's usually just a home, cell and work phone #. This is the first
time I've come across this structure ... I appreciate any feedback.

Thanks,
Heather

William Hindman wrote:
>http://www.fabalou.com/Access/Querie...pe_records.asp

...try starting here ...I've always found cleaning up dirty data to be the
most difficult task when taking over an existing database from a client
...not so much technically difficult but rather just a royal pita ...hth.

William Hindman
>>I am working with a database that has client information separated in to 4
related tables - tFolder, tAddress, tEmail, tPhone number. In addition
[quoted text clipped - 7 lines]
>relationships with the other tables. Any help on how to do this would be
appreciated.
--
Message posted via http://www.accessmonster.com

Oct 2 '06 #3
....there are no rules that work for all databases ...even codd's laws are
applicable only when they make sense ...the problem of course is that your
client is a small non-profit now and the single table name, address, phones,
e addresses works "ok" ...but then it gets passed on or they grow or
something changes ...and now they need separate billing, shipping, and
street addresses in addition to a po box ...with the flat table design you
just keep on adding fields ...which of course means that everytime you pull
that record all of that data has to be filtered.

....the fully (well at least third normal) normalized db structure that I
default to uses a super entity with breakouts for addresses, phones,
e-addresses, and c/s/z/c on one side and entity structures such as
organizations and people on the other all related through the super entity
....which allows me to share the data without duplicating it ...I use it in
virtually every case because I know it inside out and have a code framework
built around it that makes it quick and easy to adapt to a wide variety of
client needs.

....but for you, in this specific instance, its way too much ...and there is
no apparent reason why you could not denrmalize the data into a simpler
relational model if that works best for you.

....helen feddema has some good examples and explanations on how to go about
denormalizing existing data http://www.helenfeddema.com/

....as for deleting duplicate records, the query approach works fine ...you
just need to make sure you get all the child tables deduped before the
parent tables ...that's where a backup covers your behind in case you make a
mistake ...once you have the exact duplicates deleted, then you can back up
and repeat the process looking for near duplicates containing related data
in different fields ...its repetitive and a royal pita ...but if you pay
attention, it works.

....hth

William Hindman

"HLCruz via AccessMonster.com" <u27207@uwewrote in message
news:672c204095fa7@uwe...
Thanks William, I tried a similar approach already unfortunately not only
is
my data duplicated, but the valid data is spread out between the
duplicates
related tables. Deleting duplicates in the main table just lead to orphan
records in other tables or missing information. I think I'm stuck doing
this
manually ...

A philosphical question: What benefit to a database design and function
is
there to separate client information into multiple, related tables -
i.e. -
address, phone, email. This is a small non-profit organization, most
clients
have only one address or phone number and those that do have multiple
contacts it's usually just a home, cell and work phone #. This is the
first
time I've come across this structure ... I appreciate any feedback.

Thanks,
Heather

William Hindman wrote:
>>http://www.fabalou.com/Access/Querie...pe_records.asp

...try starting here ...I've always found cleaning up dirty data to be the
most difficult task when taking over an existing database from a client
...not so much technically difficult but rather just a royal pita ...hth.

William Hindman
>>>I am working with a database that has client information separated in to
4
related tables - tFolder, tAddress, tEmail, tPhone number. In addition
[quoted text clipped - 7 lines]
>>relationships with the other tables. Any help on how to do this would
be
appreciated.

--
Message posted via http://www.accessmonster.com

Oct 2 '06 #4
Thanks so much for your comments - I will definately have some more
converstations with the company to see if the growth is expected like you
mentioned. If not, I will probably denormalize. Either way, I now see the
motivation that was behind the structure.

As for the duplicates, I didn't think about taking the time to dedupe each
child table. That makes perfect sense and I will start today. Thanks again
for your help.

William Hindman wrote:
>...there are no rules that work for all databases ...even codd's laws are
applicable only when they make sense ...the problem of course is that your
client is a small non-profit now and the single table name, address, phones,
e addresses works "ok" ...but then it gets passed on or they grow or
something changes ...and now they need separate billing, shipping, and
street addresses in addition to a po box ...with the flat table design you
just keep on adding fields ...which of course means that everytime you pull
that record all of that data has to be filtered.

...the fully (well at least third normal) normalized db structure that I
default to uses a super entity with breakouts for addresses, phones,
e-addresses, and c/s/z/c on one side and entity structures such as
organizations and people on the other all related through the super entity
...which allows me to share the data without duplicating it ...I use it in
virtually every case because I know it inside out and have a code framework
built around it that makes it quick and easy to adapt to a wide variety of
client needs.

...but for you, in this specific instance, its way too much ...and there is
no apparent reason why you could not denrmalize the data into a simpler
relational model if that works best for you.

...helen feddema has some good examples and explanations on how to go about
denormalizing existing data http://www.helenfeddema.com/

...as for deleting duplicate records, the query approach works fine ...you
just need to make sure you get all the child tables deduped before the
parent tables ...that's where a backup covers your behind in case you make a
mistake ...once you have the exact duplicates deleted, then you can back up
and repeat the process looking for near duplicates containing related data
in different fields ...its repetitive and a royal pita ...but if you pay
attention, it works.

...hth

William Hindman
>Thanks William, I tried a similar approach already unfortunately not only
is
[quoted text clipped - 34 lines]
>>>be
appreciated.
--
Message posted via http://www.accessmonster.com

Oct 3 '06 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

9
by: Catherine Jo Morgan | last post by:
Can I set it up so that a certain combination of fields can't contain the same entries, on another record? e.g. a combination of FirstName/LastName/address? Or FirstName/LastName/phone? Or...
3
by: Donna Price | last post by:
I have a problem with a database which I've recently separated into several related tables. I have a main data entry form, which has several subforms on it on tabbed pages. The first subform is...
5
by: Rick | last post by:
The data adapter wizard allows you to add more than one table, but that doesn't seem to work right when setting up a dataset. Some of the documentation I have read states that only one table...
5
by: sutphinwb | last post by:
Hi - This could be a simple question. When I relate two tables in a datasetet, how do I get that relation to show up in a GridView? The only way I've done it, is to create a separate table in the...
3
by: Aaron | last post by:
Hi, It is possible to establish relationships between tables within a dataset and this allows some useful features. For example, given a row from a table you can use GetParentRow,...
1
by: jpr | last post by:
Hello, My database has 5 tables. WHen I add data to one table, it runs an append query that copies three records to other 4 tables. The main table is MASTER. The data I copy are: ID, SSN and...
5
by: Will | last post by:
- I know enough ASP and Access to be dangerous :) - I need to put up a data base on our web server with 3 related tables. - They will be accessed by a limited number of people. - Each user will...
0
by: claus | last post by:
Hi, I am not a programmer but tries anyway to program a feature in a form where I am able to copy entries for the form and a subform. I have tried follow the guide here...
2
by: Shawn Yates | last post by:
I am creating a Zip Code search tool. I have a Table with every zip code and its corresponding city and state. I have also created an input table that allows the user to input cities and states....
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.