By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
424,456 Members | 1,472 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 424,456 IT Pros & Developers. It's quick & easy.

Table Design Headache - How do I preserve data integrity?

P: n/a
I am designing an employer database, and I had thought that my
tabledesign was pretty sound, but a new revelation has ruined my
design. What I guess i really need is a way to uniquely identify
employers.

originally I had planned to have an employer table, with autonumber
employer id #, the employer name, and the employer federal
identification number. then i have a location table, since each
employer can have multiple locations, i.e. walmart, home depot, burger
king, etc. the location table would have an autonumber location id, the
address, and the employer id number as a foreign key. the good part
about this design, is that i can prevent duplicate employer entries by
requiring the federal employer id number (FEIN) to be unique. this
would prevent someone from entering a mcdonalds entry, then another
user entering a mc donalds entry, for the same FEIN. the problem with
this is that i had incorrectly assumed that each employer had a unique
FEIN, where in fact, multiple mcdonalds locations can have different
FEIN's since they are franchises, and owned independently. with this
model, the only option that I can see would be to move the FEIN to the
location table, which would create repeated values for employers like
walmart, where every location has the same FEIN, and also invite data
integrity problems, since users could enter different spellings for
companies with the same name, since they have different FEIN's. this
would cause problems later, when trying to do a query for all mcdonalds
locations if they were entered in multiple ways.

Can anyone give me a tip here on how to better design my tables to
prevent duplicate entries?

Jan 16 '07 #1
Share this Question
Share on Google+
4 Replies


P: n/a
Deus402 wrote:
I am designing an employer database, and I had thought that my
tabledesign was pretty sound, but a new revelation has ruined my
design. What I guess i really need is a way to uniquely identify
employers.

originally I had planned to have an employer table, with autonumber
employer id #, the employer name, and the employer federal
identification number. then i have a location table, since each
employer can have multiple locations, i.e. walmart, home depot, burger
king, etc. the location table would have an autonumber location id, the
address, and the employer id number as a foreign key. the good part
about this design, is that i can prevent duplicate employer entries by
requiring the federal employer id number (FEIN) to be unique. this
would prevent someone from entering a mcdonalds entry, then another
user entering a mc donalds entry, for the same FEIN. the problem with
this is that i had incorrectly assumed that each employer had a unique
FEIN, where in fact, multiple mcdonalds locations can have different
FEIN's since they are franchises, and owned independently. with this
model, the only option that I can see would be to move the FEIN to the
location table, which would create repeated values for employers like
walmart, where every location has the same FEIN, and also invite data
integrity problems, since users could enter different spellings for
companies with the same name, since they have different FEIN's. this
would cause problems later, when trying to do a query for all mcdonalds
locations if they were entered in multiple ways.

Can anyone give me a tip here on how to better design my tables to
prevent duplicate entries?
I see this as a challenge as well. I believe you are correct in that
FEINs should be relegated to the locations table.

You need to thwart attempts to insert multiple, identical "master"
employer records. Since there may be no good "master" employer
identifier, you could coerce users to search for keywords, perhaps
employing "soundex" functionality to maximize the chances for spotting
existing entries. Soundex routines have been posted in Access groups.

Let us know how you get along!
--
Smartin
Jan 17 '07 #2

P: n/a
Using a soundex algorithm to try to minimize different/mis- spellings
is definately an interesting idea, and i checked it out on wikipedia.
it seems to be better suited to checking for duplicate surnames than
business names though. just in my brief overview i got of it i am not
sure how well it would handle names such as "ALEGENT HEALTHCARE FOOD &
NUTRITION SERVICE" and additionally even if you did use the algorithm
to generate a soundex code for that title, it wouldn't prevent another
user from entering a new employer as just alegent healthcare. or
alegent health care. or alegent healthcare food and nutrition svc. etc.
there may be other algorithms that work better, but it seems like it
can get very complex very fast. i do not really want to build in a
version of agrep into my database.

also, maybe i am wrong, and the soundex algorithm will work fine for
what i am trying to do. there must be some sort of precedent, this
doesn't seem like it would be an uncommon problem.

at any rate, thank you for your response, i appreciate any help i can
get.

Smartin wrote:
Deus402 wrote:
I am designing an employer database, and I had thought that my
tabledesign was pretty sound, but a new revelation has ruined my
design. What I guess i really need is a way to uniquely identify
employers.

originally I had planned to have an employer table, with autonumber
employer id #, the employer name, and the employer federal
identification number. then i have a location table, since each
employer can have multiple locations, i.e. walmart, home depot, burger
king, etc. the location table would have an autonumber location id, the
address, and the employer id number as a foreign key. the good part
about this design, is that i can prevent duplicate employer entries by
requiring the federal employer id number (FEIN) to be unique. this
would prevent someone from entering a mcdonalds entry, then another
user entering a mc donalds entry, for the same FEIN. the problem with
this is that i had incorrectly assumed that each employer had a unique
FEIN, where in fact, multiple mcdonalds locations can have different
FEIN's since they are franchises, and owned independently. with this
model, the only option that I can see would be to move the FEIN to the
location table, which would create repeated values for employers like
walmart, where every location has the same FEIN, and also invite data
integrity problems, since users could enter different spellings for
companies with the same name, since they have different FEIN's. this
would cause problems later, when trying to do a query for all mcdonalds
locations if they were entered in multiple ways.

Can anyone give me a tip here on how to better design my tables to
prevent duplicate entries?

I see this as a challenge as well. I believe you are correct in that
FEINs should be relegated to the locations table.

You need to thwart attempts to insert multiple, identical "master"
employer records. Since there may be no good "master" employer
identifier, you could coerce users to search for keywords, perhaps
employing "soundex" functionality to maximize the chances for spotting
existing entries. Soundex routines have been posted in Access groups.

Let us know how you get along!
--
Smartin
Jan 17 '07 #3

P: n/a
Deus402 wrote:
Using a soundex algorithm to try to minimize different/mis- spellings
is definately an interesting idea, and i checked it out on wikipedia.
it seems to be better suited to checking for duplicate surnames than
business names though. just in my brief overview i got of it i am not
sure how well it would handle names such as "ALEGENT HEALTHCARE FOOD &
NUTRITION SERVICE" and additionally even if you did use the algorithm
to generate a soundex code for that title, it wouldn't prevent another
user from entering a new employer as just alegent healthcare. or
alegent health care. or alegent healthcare food and nutrition svc. etc.
True, and I was going to mention something about acuity of the users
being a factor. You can guide them ad infinitum, make suggestions,
double dare, etc. But since you don't have a definitive way to identify
the entities, it comes down to user choices whether a record gets added
or not.

I used to work in health care. In an effort to minimize creation of
duplicate patient records (and indeed, unduplicate a sizable existing
mess) my employer engaged the services of a product that used Bayesian
techniques to identify similarities in patient records. The technology
is complex. Dozens of fields were evaluated, fields were assigned
weights, types of differences were assigned weights, all was added up to
a probability of duplication. During the cleanup phase, hundreds of
thousands of records were "automatically" merged, but thousands more
were left to human eyes for examination.

I suspect all that is probably outside the scope of the pursuits of mere
mortals in Access applications, but I mention it as a example of where
things can go.

The soundex option still seems like it still might be viable, though in
the end you may be left with residual cleanup.

Given your example, some things come to mind. Soundex only the first N
characters. Or soundex each word separately and make an attempt to find
similarities. Assuming your database will have only thousands, and not
millions of entities to compare, you might well be able to lead your
users to the correct choice most of the time.
there may be other algorithms that work better, but it seems like it
can get very complex very fast. i do not really want to build in a
version of agrep into my database.

also, maybe i am wrong, and the soundex algorithm will work fine for
what i am trying to do. there must be some sort of precedent, this
doesn't seem like it would be an uncommon problem.
I would like to see other ideas as well.

And me.
at any rate, thank you for your response, i appreciate any help i can
get.

Smartin wrote:
>Deus402 wrote:
>>I am designing an employer database, and I had thought that my
tabledesign was pretty sound, but a new revelation has ruined my
design. What I guess i really need is a way to uniquely identify
employers.

originally I had planned to have an employer table, with autonumber
employer id #, the employer name, and the employer federal
identification number. then i have a location table, since each
employer can have multiple locations, i.e. walmart, home depot, burger
king, etc. the location table would have an autonumber location id, the
address, and the employer id number as a foreign key. the good part
about this design, is that i can prevent duplicate employer entries by
requiring the federal employer id number (FEIN) to be unique. this
would prevent someone from entering a mcdonalds entry, then another
user entering a mc donalds entry, for the same FEIN. the problem with
this is that i had incorrectly assumed that each employer had a unique
FEIN, where in fact, multiple mcdonalds locations can have different
FEIN's since they are franchises, and owned independently. with this
model, the only option that I can see would be to move the FEIN to the
location table, which would create repeated values for employers like
walmart, where every location has the same FEIN, and also invite data
integrity problems, since users could enter different spellings for
companies with the same name, since they have different FEIN's. this
would cause problems later, when trying to do a query for all mcdonalds
locations if they were entered in multiple ways.

Can anyone give me a tip here on how to better design my tables to
prevent duplicate entries?
I see this as a challenge as well. I believe you are correct in that
FEINs should be relegated to the locations table.

You need to thwart attempts to insert multiple, identical "master"
employer records. Since there may be no good "master" employer
identifier, you could coerce users to search for keywords, perhaps
employing "soundex" functionality to maximize the chances for spotting
existing entries. Soundex routines have been posted in Access groups.

Let us know how you get along!
--
Smartin

--
Smartin
Jan 18 '07 #4

P: n/a
Your database design does not model "real life" with sufficent accuracy.

McDonald's is an employer... the McDonalds company has its own employees,
but it does not own and operate all McDonald's location. The same is true of
many restaurant and fast-food chains. You need another level for the
Franchisees who own one or more locations. You just don't have enough
levels... or perhaps the Franchisees should be on the same level as
McDonalds, with a Field indicating the Franchise, if any.

Any business database must model the real world with sufficient accuracy to
meet the business purpose... if your business purpose requires identifying
the employer, as it appears to, then you need to model that actual real-life
structure. If it was to collect statistics on customer satisfaction with
different locations, then you might not need the franchisee to be
identified. And, if there's no need to accumulate the information for all
franchisees of a given brand, maybe you don't need McDonalds listed at all,
or only need to say ABC Partnership McDonald's Franchisee.

Caveat: you will find that some franchisees operate locations for multiple
brands... I couldn't tell you if some of them use the same employer number
for different brand locations, but you'd need to know that, too.

Larry Linson
Microsoft Access MVP
"Deus402" <dp****@gmail.comwrote in message
news:11*********************@q2g2000cwa.googlegrou ps.com...
>I am designing an employer database, and I had thought that my
tabledesign was pretty sound, but a new revelation has ruined my
design. What I guess i really need is a way to uniquely identify
employers.

originally I had planned to have an employer table, with autonumber
employer id #, the employer name, and the employer federal
identification number. then i have a location table, since each
employer can have multiple locations, i.e. walmart, home depot, burger
king, etc. the location table would have an autonumber location id, the
address, and the employer id number as a foreign key. the good part
about this design, is that i can prevent duplicate employer entries by
requiring the federal employer id number (FEIN) to be unique. this
would prevent someone from entering a mcdonalds entry, then another
user entering a mc donalds entry, for the same FEIN. the problem with
this is that i had incorrectly assumed that each employer had a unique
FEIN, where in fact, multiple mcdonalds locations can have different
FEIN's since they are franchises, and owned independently. with this
model, the only option that I can see would be to move the FEIN to the
location table, which would create repeated values for employers like
walmart, where every location has the same FEIN, and also invite data
integrity problems, since users could enter different spellings for
companies with the same name, since they have different FEIN's. this
would cause problems later, when trying to do a query for all mcdonalds
locations if they were entered in multiple ways.

Can anyone give me a tip here on how to better design my tables to
prevent duplicate entries?

Jan 18 '07 #5

This discussion thread is closed

Replies have been disabled for this discussion.