By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
425,855 Members | 865 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 425,855 IT Pros & Developers. It's quick & easy.

Clean up duplicate data in multiple, related tables

P: n/a
I am working with a database that has client information separated in to 4
related tables - tFolder, tAddress, tEmail, tPhone number. In addition there
are related tables tGifts and tCalls. The database has roughly 22,000
records but should only have around 6,000. The remaining records are
duplicates, but in many cases the correct data for one person is spread out
between the duplicate records and related tables. I need to be able to
delete the duplicates while maintaining the integrity of the data and the
relationships with the other tables. Any help on how to do this would be
appreciated.

--
Message posted via AccessMonster.com
http://www.accessmonster.com/Uwe/For...ccess/200610/1

Oct 2 '06 #1
Share this Question
Share on Google+
4 Replies


P: n/a
http://www.fabalou.com/Access/Querie...pe_records.asp

....try starting here ...I've always found cleaning up dirty data to be the
most difficult task when taking over an existing database from a client
....not so much technically difficult but rather just a royal pita ...hth.

William Hindman

"HLCruz via AccessMonster.com" <u27207@uwewrote in message
news:672b99dabfa0b@uwe...
>I am working with a database that has client information separated in to 4
related tables - tFolder, tAddress, tEmail, tPhone number. In addition
there
are related tables tGifts and tCalls. The database has roughly 22,000
records but should only have around 6,000. The remaining records are
duplicates, but in many cases the correct data for one person is spread
out
between the duplicate records and related tables. I need to be able to
delete the duplicates while maintaining the integrity of the data and the
relationships with the other tables. Any help on how to do this would be
appreciated.

--
Message posted via AccessMonster.com
http://www.accessmonster.com/Uwe/For...ccess/200610/1

Oct 2 '06 #2

P: n/a
Thanks William, I tried a similar approach already unfortunately not only is
my data duplicated, but the valid data is spread out between the duplicates
related tables. Deleting duplicates in the main table just lead to orphan
records in other tables or missing information. I think I'm stuck doing this
manually ...

A philosphical question: What benefit to a database design and function is
there to separate client information into multiple, related tables - i.e. -
address, phone, email. This is a small non-profit organization, most clients
have only one address or phone number and those that do have multiple
contacts it's usually just a home, cell and work phone #. This is the first
time I've come across this structure ... I appreciate any feedback.

Thanks,
Heather

William Hindman wrote:
>http://www.fabalou.com/Access/Querie...pe_records.asp

...try starting here ...I've always found cleaning up dirty data to be the
most difficult task when taking over an existing database from a client
...not so much technically difficult but rather just a royal pita ...hth.

William Hindman
>>I am working with a database that has client information separated in to 4
related tables - tFolder, tAddress, tEmail, tPhone number. In addition
[quoted text clipped - 7 lines]
>relationships with the other tables. Any help on how to do this would be
appreciated.
--
Message posted via http://www.accessmonster.com

Oct 2 '06 #3

P: n/a
....there are no rules that work for all databases ...even codd's laws are
applicable only when they make sense ...the problem of course is that your
client is a small non-profit now and the single table name, address, phones,
e addresses works "ok" ...but then it gets passed on or they grow or
something changes ...and now they need separate billing, shipping, and
street addresses in addition to a po box ...with the flat table design you
just keep on adding fields ...which of course means that everytime you pull
that record all of that data has to be filtered.

....the fully (well at least third normal) normalized db structure that I
default to uses a super entity with breakouts for addresses, phones,
e-addresses, and c/s/z/c on one side and entity structures such as
organizations and people on the other all related through the super entity
....which allows me to share the data without duplicating it ...I use it in
virtually every case because I know it inside out and have a code framework
built around it that makes it quick and easy to adapt to a wide variety of
client needs.

....but for you, in this specific instance, its way too much ...and there is
no apparent reason why you could not denrmalize the data into a simpler
relational model if that works best for you.

....helen feddema has some good examples and explanations on how to go about
denormalizing existing data http://www.helenfeddema.com/

....as for deleting duplicate records, the query approach works fine ...you
just need to make sure you get all the child tables deduped before the
parent tables ...that's where a backup covers your behind in case you make a
mistake ...once you have the exact duplicates deleted, then you can back up
and repeat the process looking for near duplicates containing related data
in different fields ...its repetitive and a royal pita ...but if you pay
attention, it works.

....hth

William Hindman

"HLCruz via AccessMonster.com" <u27207@uwewrote in message
news:672c204095fa7@uwe...
Thanks William, I tried a similar approach already unfortunately not only
is
my data duplicated, but the valid data is spread out between the
duplicates
related tables. Deleting duplicates in the main table just lead to orphan
records in other tables or missing information. I think I'm stuck doing
this
manually ...

A philosphical question: What benefit to a database design and function
is
there to separate client information into multiple, related tables -
i.e. -
address, phone, email. This is a small non-profit organization, most
clients
have only one address or phone number and those that do have multiple
contacts it's usually just a home, cell and work phone #. This is the
first
time I've come across this structure ... I appreciate any feedback.

Thanks,
Heather

William Hindman wrote:
>>http://www.fabalou.com/Access/Querie...pe_records.asp

...try starting here ...I've always found cleaning up dirty data to be the
most difficult task when taking over an existing database from a client
...not so much technically difficult but rather just a royal pita ...hth.

William Hindman
>>>I am working with a database that has client information separated in to
4
related tables - tFolder, tAddress, tEmail, tPhone number. In addition
[quoted text clipped - 7 lines]
>>relationships with the other tables. Any help on how to do this would
be
appreciated.

--
Message posted via http://www.accessmonster.com

Oct 2 '06 #4

P: n/a
Thanks so much for your comments - I will definately have some more
converstations with the company to see if the growth is expected like you
mentioned. If not, I will probably denormalize. Either way, I now see the
motivation that was behind the structure.

As for the duplicates, I didn't think about taking the time to dedupe each
child table. That makes perfect sense and I will start today. Thanks again
for your help.

William Hindman wrote:
>...there are no rules that work for all databases ...even codd's laws are
applicable only when they make sense ...the problem of course is that your
client is a small non-profit now and the single table name, address, phones,
e addresses works "ok" ...but then it gets passed on or they grow or
something changes ...and now they need separate billing, shipping, and
street addresses in addition to a po box ...with the flat table design you
just keep on adding fields ...which of course means that everytime you pull
that record all of that data has to be filtered.

...the fully (well at least third normal) normalized db structure that I
default to uses a super entity with breakouts for addresses, phones,
e-addresses, and c/s/z/c on one side and entity structures such as
organizations and people on the other all related through the super entity
...which allows me to share the data without duplicating it ...I use it in
virtually every case because I know it inside out and have a code framework
built around it that makes it quick and easy to adapt to a wide variety of
client needs.

...but for you, in this specific instance, its way too much ...and there is
no apparent reason why you could not denrmalize the data into a simpler
relational model if that works best for you.

...helen feddema has some good examples and explanations on how to go about
denormalizing existing data http://www.helenfeddema.com/

...as for deleting duplicate records, the query approach works fine ...you
just need to make sure you get all the child tables deduped before the
parent tables ...that's where a backup covers your behind in case you make a
mistake ...once you have the exact duplicates deleted, then you can back up
and repeat the process looking for near duplicates containing related data
in different fields ...its repetitive and a royal pita ...but if you pay
attention, it works.

...hth

William Hindman
>Thanks William, I tried a similar approach already unfortunately not only
is
[quoted text clipped - 34 lines]
>>>be
appreciated.
--
Message posted via http://www.accessmonster.com

Oct 3 '06 #5

This discussion thread is closed

Replies have been disabled for this discussion.