473,387 Members | 1,687 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,387 software developers and data experts.

Finding Data duplication within two fields

I'm a beginner, so I will try and convey my questions as best as possible.

I have a database of approximately 20,000 companies - I will call this list LIST1, and I purchased a list to with approximately 5,000 companies - I will call this list LIST2.

I would like to take the address field from LIST2 and find any duplicates on the address field on LIST1, however the addresses do not match exactly. For example, the address on LIST1 might be 4610 TRENTON FRANKLIN ROAD, and then the address on LIST2 might be 4610 TRENTON FRANKLIN STE 2 - how would I find this potential duplicate?

I would need a select statement since the potential duplicate may not be an exact match; so I would have to review each result.

I tried the following:

select L1.id, L1.company, L2.company, L1.address_1, L2.address1
from LIST1 L1
inner join LIST2 L2 on (L2.address1 like L1.address_1 + '%') or (L1.address_1 like L2.address1 + '%')
WHERE
L1.address_1 <> ''

but, it's not working correctly.

Thanks in advance for your help.

Suzanne
Nov 4 '08 #1
1 1410
ck9663
2,878 Expert 2GB
You're going to need a what we call Address Normalization algorithm. People actually sells these algorithm. So when you managed to create one, post it here, we'll be rich ;)

-- CK
Nov 5 '08 #2

Sign in to post your reply or Sign up for a free account.

Similar topics

4
by: Aaron W. West | last post by:
Timings... sometimes there are almost too many ways to do the same thing. The only significant findings I see from all the below timings is: 1) Integer math is generally fastest, naturally....
1
by: Steve Darby | last post by:
Hi there, I wonde if anyone can help me. I have a couple of problems which I am having trouble resolving. Firstly, I have written a form which includes field for First Name, Surname and...
2
by: Antitax | last post by:
I have a database with more than 800 adress records Some of the are similar because some letters in the street adress for example are not identical, altough they point to the same adress. Does...
2
by: perryche | last post by:
Experts, Bascially I want to avoid people entering duplicated records. However, the check for the duplication are in 2 separate fields. i.e. Field1 + Field2 can not be duplicated. What's the...
3
by: Bryan Warren via AccessMonster.com | last post by:
I am creating a new database, that will contain 2 fields that will have the same data. The DB is an inventory of computers. the fields I want to duplicate are the "serial number" and the "computer...
17
by: Wilfried | last post by:
Hi, I have 5 tables: main data1
14
by: Takeadoe | last post by:
Folks, I at the proverbial fork in the road and before I make a decision, I was hoping maybe I could get some advice. I'm in the process of moving data to Access. I'd like to use Excel for...
9
by: RMC | last post by:
Hello, I'm looking for a way to parse/format a memo field within a report. The Access 2000 database (application) has an equipment table that holds a memo field. Within the report, the memo...
4
by: Deus402 | last post by:
I am designing an employer database, and I had thought that my tabledesign was pretty sound, but a new revelation has ruined my design. What I guess i really need is a way to uniquely identify...
0
petepell
by: petepell | last post by:
Hello all, I am developing an application in VB 2008 that works with a SQL2005 DB to store and manipulate employee data. In one section of the app I want to be able to show a treeview of the...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.