"Dan Gidman" <da*******@gmai l.com> wrote in message
news:11******** **************@ g14g2000cwa.goo glegroups.com.. .
Okay all I have a problem. I have two list of adresses and phone
numbers. I do not have control over the contents of the lists The only
unique field between the two is the phone number. I need to be able to
inner join the two lists on phone number.
This would normally be straigt forward but the problem is that they are
formated different and one of them does't even have a control on the
formating.
*Phone numbers are US phone numbers only.
The one list (table) that does have a control uses this format
AAA3334444
where AAA is the area code
333 is the 3 digit prefix
4444 is the four digit suffix
the second list (table) does not have any standardized formating and
can be filled with extraneous spaces, parentheses and dashes not to
mention the leading 1 in some instances.
I thought that I could do some kind of regular expression to do a
comparison but I havn't as yet found a good resource to tell me how to
do it or if it is even possible. Or maybe break up the one I know has
a standardized format into something like this:
'%AAA%333%4444% ' and doing my comparison that way. however It is very
important that only those list items in both list that are truly the
same place be listed.
suggestions and solutions are apreciated
It probably depends how bad the data is - if the number and type of the
unwanted characters is relatively predictable, then you can achieve quite a
lot with REPLACE():
create table #t (pnum varchar(20))
insert into #t select '1-800-456 7890 '
insert into #t select '(800)- 456 7890'
insert into #t select '1 800 456 7890'
insert into #t select '+1 800 (456) 7890 '
insert into #t select '(800) (456) 7890'
select pnum from #t
/* Strip out unwanted characters */
update #t set pnum = replace(pnum, ' ', '')
update #t set pnum = replace(pnum, '(', '')
update #t set pnum = replace(pnum, ')', '')
update #t set pnum = replace(pnum, '-', '')
update #t set pnum = replace(pnum, '+', '')
/* Remove leading 1 */
update #t set pnum = stuff(pnum, 1, 1, '')
where left(pnum,1) = '1'
and len(pnum) = 11
select pnum from #t
But if it gets much more complicated than that, I would consider using
bcp.exe to export the data, clean it up with an external script in a
language that has better string functions than TSQL, then load it again.
It is possible to use regexes in TSQL, by instantiating the VBScript Regex
object using the sp_OA% procedures, but it's rather clumsy and requires
sysadmin permissions, so it's probably not a good general solution.
Simon