473,386 Members | 1,823 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,386 software developers and data experts.

Fuzzy Duplicates

1
Hello all,

sorry for being NEW today. I have read many threads here and have hit google for a long time. I've tried many examples and just cannot do what i need to do.


Here's the setup.


I have a table with two columns . A unique ID and Product descriptions. Product descriptions are added by end users. These users sometimes misspell or typo and add a new record. I need to find those duplicates and report on them.

sample data:

1 Bytes Website
2 Bytes Website - New
3 Book Store
4 Bytes Website - Expert Help - Website
5 Ice Cream Shop
6 Cat Food
7 Cat Food - CatFood.com - NEW YORK

In the above data sample i would like to only see the three "bytes website" & the two "cat food" records. Leaving the other products out of the mix.

I'll then go in and manually shift the data around. I just need a duplicate report.

Thank you.

*sorry if i'm bad at trying to tell what i need*
Jun 30 '09 #1
1 2312
Annalyzer
122 100+
That's a common problem with all databases and it would be great to prevent users from entering duplicates. However, in the real world, they're going to happen just as you described.

You could potentially write a script to do what you want by going through the entire database looking at one record at a time, storing it in a variable and then comparing it to all of the other descriptions with the LIKE construct.

There is probably a better way. Hopefully someone here will be able to help you better than I can.

My preference would be to force the user to search the database before entering a new description thereby cutting down the duplicates. Write the code so the user enters the description, but instead of updating the database right away, search for LIKE descriptions and display those to the user first. Then you can give them the option to accept one of the descriptions found, add a new description as typed, or cancel the transaction.
Jul 4 '09 #2

Sign in to post your reply or Sign up for a free account.

Similar topics

17
by: Andrew McLean | last post by:
I have a problem that is suspect isn't unusual and I'm looking to see if there is any code available to help. I've Googled without success. Basically, I have two databases containing lists of...
1
by: Ray Gardener | last post by:
I was wondering if anyone had tried implementing fuzzy logic set concepts in C++, because in FL, the concept of "type" or "class" is fuzzy; things belong (or are of) a given type only by degree....
6
by: Marlene | last post by:
Hi All I have the following scenario, where I have found all the duplicates in a table, based on an order number and a part number (item).I might have something like this: Order PODate Rec...
24
by: BBands | last post by:
I have some CDs and have been archiving them on a PC. I wrote a Python script that spans the archive and returns a list of its contents: ...]. I wanted to add a search function to locate all the...
3
by: AK | last post by:
Hi Our product uses MS-SQL Server 2000. One of our customer has 10 installations with each installation stroring data in its own database. Now the customer wants to consolidate these databases...
24
by: cassetti | last post by:
Here's the issue: I have roughly 20 MS excel spreadsheets, each row contains a record. These records were hand entered by people in call centers. The problem is, there can and are duplicate...
8
by: Martin Schneider | last post by:
Hi! I have a database with approx. 2500 records. When entering a new record I'd like to avoid double entries that may differ slightly in writing. Currently I am calculating the Levenshtein...
14
by: Steve Bergman | last post by:
I'm looking for a module to do fuzzy comparison of strings. I have 2 item master files which are supposed to be identical, but they have thousands of records where the item numbers don't match in...
1
by: tskmjk55 | last post by:
Recently, I have a requirement to develop a vb.net application wherein the input excel sheet data which has an average of 5000 records should be checked for Internal duplicates (duplicates within the...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.