Good evening all, I have a problem that I've been banging my head against for a few hours and was looking for some advice. Consider a table with the following data:
SSN (int)
first_name (char)
last_name (char)
middle_name (char)
A sampling of data would appear as such:
000112222, Jones, Jeff, J.
000112222, Jones, Jeff, Jason
111223333, Wright, Kevin, ''
222333444, Jones, Jeff, J.
Knowing that a unique person may have an entry for the same SSN due to variations in the name, I need to find out if there are any instances where the best match of a person's name might be on more than one distinct SSN. In the above example, Jeff Jones appears on one SSN twice due to one record having his middle name spelled out. This is not an issue. However, Jeff Jones also appears on a completely different SSN. This is the discrepancy I need to look for.
Now, I have in the past compared names on the same SSN by using DIFFERENCE() and/or SOUNDEX() to score possible data anomalies, but I'm sort of stuck on checking the names across multiple SSN's to find when a unique person (based on names) have more than one SSN record.