468,242 Members | 1,474 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 468,242 developers. It's quick & easy.

Finding Data duplication within two fields

I'm a beginner, so I will try and convey my questions as best as possible.

I have a database of approximately 20,000 companies - I will call this list LIST1, and I purchased a list to with approximately 5,000 companies - I will call this list LIST2.

I would like to take the address field from LIST2 and find any duplicates on the address field on LIST1, however the addresses do not match exactly. For example, the address on LIST1 might be 4610 TRENTON FRANKLIN ROAD, and then the address on LIST2 might be 4610 TRENTON FRANKLIN STE 2 - how would I find this potential duplicate?

I would need a select statement since the potential duplicate may not be an exact match; so I would have to review each result.

I tried the following:

select L1.id, L1.company, L2.company, L1.address_1, L2.address1
from LIST1 L1
inner join LIST2 L2 on (L2.address1 like L1.address_1 + '%') or (L1.address_1 like L2.address1 + '%')
L1.address_1 <> ''

but, it's not working correctly.

Thanks in advance for your help.

Nov 4 '08 #1
1 1273
2,878 Expert 2GB
You're going to need a what we call Address Normalization algorithm. People actually sells these algorithm. So when you managed to create one, post it here, we'll be rich ;)

-- CK
Nov 5 '08 #2

Post your reply

Sign in to post your reply or Sign up for a free account.

Similar topics

4 posts views Thread by Aaron W. West | last post: by
1 post views Thread by Steve Darby | last post: by
3 posts views Thread by Bryan Warren via AccessMonster.com | last post: by
17 posts views Thread by Wilfried | last post: by
reply views Thread by NPC403 | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.