By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
438,842 Members | 2,368 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 438,842 IT Pros & Developers. It's quick & easy.

Finding Data duplication within two fields

P: 1
I'm a beginner, so I will try and convey my questions as best as possible.

I have a database of approximately 20,000 companies - I will call this list LIST1, and I purchased a list to with approximately 5,000 companies - I will call this list LIST2.

I would like to take the address field from LIST2 and find any duplicates on the address field on LIST1, however the addresses do not match exactly. For example, the address on LIST1 might be 4610 TRENTON FRANKLIN ROAD, and then the address on LIST2 might be 4610 TRENTON FRANKLIN STE 2 - how would I find this potential duplicate?

I would need a select statement since the potential duplicate may not be an exact match; so I would have to review each result.

I tried the following:

select L1.id, L1.company, L2.company, L1.address_1, L2.address1
from LIST1 L1
inner join LIST2 L2 on (L2.address1 like L1.address_1 + '%') or (L1.address_1 like L2.address1 + '%')
WHERE
L1.address_1 <> ''

but, it's not working correctly.

Thanks in advance for your help.

Suzanne
Nov 4 '08 #1
Share this Question
Share on Google+
1 Reply


ck9663
Expert 2.5K+
P: 2,878
You're going to need a what we call Address Normalization algorithm. People actually sells these algorithm. So when you managed to create one, post it here, we'll be rich ;)

-- CK
Nov 5 '08 #2

Post your reply

Sign in to post your reply or Sign up for a free account.