473,378 Members | 1,236 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,378 software developers and data experts.

How to develop/acquire effective duplicate matching routine?

I'm not sure if I am posting this message in the right place. Please let me know if I should redirect this question elsewhere.

I'm currently working on a large database cleanup and consolidation project where I own pretty much everything to do with the project. The database represents a healthcare network.

Part of the data cleanup is merging duplicate doctors. The data is split into a master table as well as additional tables for addresses, ID's, etc. The current database contains data sourced from 3 different various data sources over time so the range of duplicates is pretty vast.

I have created a semi-effective duplicate finding routine in SQL Server T-SQL but it is slow and not very flexible. Ideally I'd like the same duplicate finding/matching routine to be able to be used for various purposes such as cleaning up current data, validating new data entered by users (checking if provider already exists), and properly matching up doctors from external data sources to doctors into our internal database.

Having a highly accurate and efficient duplicate control is a crucial part of this project and what I've created so far doesn't work as well as I would like. There's so many variations of names, ID's, etc. that I'm not sure how to improve it. I imagine there's some kind of confidence level technique I can use or something.

I've tried searching online for some tips but I can't seem to find anything specific to my situation. Anyone have any ideas or comments on how they would proceed? I currently am able to develop with SQL Server (T-SQL) and VB.NET. I also am familiar with creating CLR routines (coupling VB.NET with SQL Server). I also am considering outsourcing it to have someone else work on this component, but don't know where to go for that either.

Thanks,

Zach
Sep 18 '10 #1
0 1075

Sign in to post your reply or Sign up for a free account.

Similar topics

3
by: Giloosh | last post by:
Hello, i need some help if possible... i have a payments table with over 500 records i want to run a query that searches through the table spotting out any duplicate ID#'s and Dates. So basically...
17
by: Andrew McLean | last post by:
I have a problem that is suspect isn't unusual and I'm looking to see if there is any code available to help. I've Googled without success. Basically, I have two databases containing lists of...
10
by: Rada Chirkova | last post by:
Hi, at NC State University, my students and I are working on a project called "self-organizing databases," please see description below. I would like to use an open-source database system for...
1
by: Danny | last post by:
I am trying to loop through a database and if a keyfield exists in another table, I need to duplicate it. What do you recommend? I assume I need to link the two based on the key field (fields...
2
by: P B via AccessMonster.com | last post by:
I have a list of 160,000 records with these fields: fname, lname, address, city, state, zip, dob I need to generate a list with all fields where the first initial of lname and the dob are...
4
by: Mark L. Breen | last post by:
Hello Guys and Galls, I use combos on my forms. The code to initialise the combos is as follows Dim dsPIDTypes As DataSet dsPIDTypes = PartDB.GetPIDTypes ' Returns a dataset object...
1
by: walterbyrd | last post by:
I am trying to develop a simple user authentication routine. I started with something I got from a book called "PHP in Easy Steps." It works like this: - create a table in a database with...
2
by: mavmavv | last post by:
I have a Form where I have created a duplicate record button, no problem... The subform is where my problem lies. The subform displays data matching the mainform's ID, these two values are...
15
by: Peter | last post by:
I right now reading this book. And he is iterating some points I'm following since 1996, e.g. exception safety. But e.g. he is missing one of the major exception safety guidelines, which is ...
7
bwesenberg
by: bwesenberg | last post by:
Hello All I have a form that my users enter their Audits in. The Primary key is the RECID field. They enter a policy number and an effective date in this form as well. What they want to happen...
0
by: ryjfgjl | last post by:
In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.