By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
440,827 Members | 813 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 440,827 IT Pros & Developers. It's quick & easy.

Updating Street Address Parser

P: n/a
About once a year or so for the last 10 years, I update my street
address parser and I'm starting to look at it again. This parser
splits a street address line into its smallest common elements
(number, trailer, pre, name, suffix, post, unit, unit id). I always
start this update process by searching Google-Groups and Google-web
for anything new out there, but there is never very much.

Has anyone run into anything in their travels? (source code form the
unix/linux bunch, a whitepaper form some brainy University professor,
an author that wrote a few lines...)

I'm really not looking any recommendations for commercial products.

Thanks
Tom
Nov 13 '05 #1
Share this Question
Share on Google+
4 Replies


P: n/a
Tom,
We kept a database of common street names and address pieces by country.
Then we ran the address against our list looking for our standardized
version of whatever the address was. If it was a new variant spelling of a
listed address part the address was listed for review by our clerical staff.
Assuming we found all the parts of the address, our code then returned a
list of potential matches sorted by probability of a match. The last piece
of the puzzle was some code that said as long as the probability is within
an acceptable margin of error (+/- 5%) we would allow the code to change the
provided address to our standardized address. This way when we processed
addresses our clerical staff only had to deal with the small list of problem
addresses that the software could not clean up.

"Tom Warren" <tw*@gate.net> wrote in message
news:f4**************************@posting.google.c om...
About once a year or so for the last 10 years, I update my street
address parser and I'm starting to look at it again. This parser
splits a street address line into its smallest common elements
(number, trailer, pre, name, suffix, post, unit, unit id). I always
start this update process by searching Google-Groups and Google-web
for anything new out there, but there is never very much.

Has anyone run into anything in their travels? (source code form the
unix/linux bunch, a whitepaper form some brainy University professor,
an author that wrote a few lines...)

I'm really not looking any recommendations for commercial products.

Thanks
Tom

Nov 13 '05 #2

P: n/a
Alan,

How do handle street address syntax, if all you’re doing is word
matching, or an I missing your point?

Tom

"Alan Webb" <kn*****@hotmail.com> wrote in message news:<e4********************@comcast.com>...
Tom,
We kept a database of common street names and address pieces by country.
Then we ran the address against our list looking for our standardized
version of whatever the address was. If it was a new variant spelling of a
listed address part the address was listed for review by our clerical staff.
Assuming we found all the parts of the address, our code then returned a
list of potential matches sorted by probability of a match. The last piece
of the puzzle was some code that said as long as the probability is within
an acceptable margin of error (+/- 5%) we would allow the code to change the
provided address to our standardized address. This way when we processed
addresses our clerical staff only had to deal with the small list of problem
addresses that the software could not clean up.

"Tom Warren" <tw*@gate.net> wrote in message
news:f4**************************@posting.google.c om...
About once a year or so for the last 10 years, I update my street
address parser and I'm starting to look at it again. This parser
splits a street address line into its smallest common elements
(number, trailer, pre, name, suffix, post, unit, unit id). I always
start this update process by searching Google-Groups and Google-web
for anything new out there, but there is never very much.

Has anyone run into anything in their travels? (source code form the
unix/linux bunch, a whitepaper form some brainy University professor,
an author that wrote a few lines...)

I'm really not looking any recommendations for commercial products.

Thanks
Tom

Nov 13 '05 #3

P: n/a
Also, does anyone have (lists of or) addresses that your parse doesn't
handle properly?

Tom
Nov 13 '05 #4

P: n/a
Tom,
We winged it. As a first pass we stored the entire string in a column and
its approved standardized version. After that it got ugly.
BTW--the USPS has its list of addresses for the US for sale on CD. Check
www.usps.gov for info.

"Tom Warren" <tw*@gate.net> wrote in message
news:f4**************************@posting.google.c om...
Also, does anyone have (lists of or) addresses that your parse doesn't
handle properly?

Tom

Nov 13 '05 #5

This discussion thread is closed

Replies have been disabled for this discussion.