Finding Data duplication within two fields

I'm a beginner, so I will try and convey my questions as best as possible.

I have a database of approximately 20,000 companies - I will call this list LIST1, and I purchased a list to with approximately 5,000 companies - I will call this list LIST2.

I would like to take the address field from LIST2 and find any duplicates on the address field on LIST1, however the addresses do not match exactly. For example, the address on LIST1 might be 4610 TRENTON FRANKLIN ROAD, and then the address on LIST2 might be 4610 TRENTON FRANKLIN STE 2 - how would I find this potential duplicate?

I would need a select statement since the potential duplicate may not be an exact match; so I would have to review each result.

I tried the following:

select L1.id, L1.company, L2.company, L1.address_1, L2.address1
from LIST1 L1
inner join LIST2 L2 on (L2.address1 like L1.address_1 + '%') or (L1.address_1 like L2.address1 + '%')
WHERE
L1.address_1 <> ''

but, it's not working correctly.

Thanks in advance for your help.

Suzanne

Nov 4 '08 #1

Subscribe Post Reply

1410

ck9663

2,878

Expert 2GB

You're going to need a what we call Address Normalization algorithm. People actually sells these algorithm. So when you managed to create one, post it here, we'll be rich ;)

-- CK

Nov 5 '08 #2

Similar topics

Finding if a REAL is LIKE '%[49]9'

by: Aaron W. West | last post by:

Timings... sometimes there are almost too many ways to do the same thing. The only significant findings I see from all the below timings is: 1) Integer math is generally fastest, naturally....

Microsoft SQL Server

Form field duplication

by: Steve Darby | last post by:

Hi there, I wonde if anyone can help me. I have a couple of problems which I am having trouble resolving. Firstly, I have written a form which includes field for First Name, Surname and...

Javascript

Query for finding similar adress records in database

by: Antitax | last post by:

I have a database with more than 800 adress records Some of the are similar because some letters in the street adress for example are not identical, altough they point to the same adress. Does...

Microsoft Access / VBA

Prevent Duplication records in 2 fields in Table

by: perryche | last post by:

Experts, Bascially I want to avoid people entering duplicated records. However, the check for the duplication are in 2 separate fields. i.e. Field1 + Field2 can not be duplicated. What's the...

Microsoft Access / VBA

Field to Field data duplication

by: Bryan Warren via AccessMonster.com | last post by:

I am creating a new database, that will contain 2 fields that will have the same data. The DB is an inventory of computers. the fields I want to duplicate are the "serial number" and the "computer...

Microsoft Access / VBA

Data in different tables.

by: Wilfried | last post by:

Hi, I have 5 tables: main data1

Microsoft Access / VBA

A resaonable approach to managing data?

by: Takeadoe | last post by:

Folks, I at the proverbial fork in the road and before I make a decision, I was hoping maybe I could get some advice. I'm in the process of moving data to Access. I'd like to use Excel for...

Microsoft Access / VBA

parse or format data from memo field onto Access report

by: RMC | last post by:

Hello, I'm looking for a way to parse/format a memo field within a report. The Access 2000 database (application) has an equipment table that holds a memo field. Within the report, the memo...

Microsoft Access / VBA

Table Design Headache - How do I preserve data integrity?

by: Deus402 | last post by:

I am designing an employer database, and I had thought that my tabledesign was pretty sound, but a new revelation has ruined my design. What I guess i really need is a way to uniquely identify...

Microsoft Access / VBA

Nested hierarchial data in treeview

by: petepell | last post by:

Hello all, I am developing an application in VB 2008 that works with a SQL2005 DB to store and manipulate employee data. In one section of the app I want to be able to show a treeview of the...

Visual Basic 4 / 5 / 6

Easy Steps to Fix "Canon Printer Won't Connect to WiFi Network"

by: taylorcarr | last post by:

A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...

General

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Batch import of multiple excel files into the database

by: ryjfgjl | last post by:

If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...

Data Management

Merging data from multiple Excel files

by: ryjfgjl | last post by:

In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...

Data Management

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing