473,397 Members | 2,084 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,397 software developers and data experts.

Updating Street Address Parser

About once a year or so for the last 10 years, I update my street
address parser and I'm starting to look at it again. This parser
splits a street address line into its smallest common elements
(number, trailer, pre, name, suffix, post, unit, unit id). I always
start this update process by searching Google-Groups and Google-web
for anything new out there, but there is never very much.

Has anyone run into anything in their travels? (source code form the
unix/linux bunch, a whitepaper form some brainy University professor,
an author that wrote a few lines...)

I'm really not looking any recommendations for commercial products.

Thanks
Tom
Nov 13 '05 #1
4 5485
Tom,
We kept a database of common street names and address pieces by country.
Then we ran the address against our list looking for our standardized
version of whatever the address was. If it was a new variant spelling of a
listed address part the address was listed for review by our clerical staff.
Assuming we found all the parts of the address, our code then returned a
list of potential matches sorted by probability of a match. The last piece
of the puzzle was some code that said as long as the probability is within
an acceptable margin of error (+/- 5%) we would allow the code to change the
provided address to our standardized address. This way when we processed
addresses our clerical staff only had to deal with the small list of problem
addresses that the software could not clean up.

"Tom Warren" <tw*@gate.net> wrote in message
news:f4**************************@posting.google.c om...
About once a year or so for the last 10 years, I update my street
address parser and I'm starting to look at it again. This parser
splits a street address line into its smallest common elements
(number, trailer, pre, name, suffix, post, unit, unit id). I always
start this update process by searching Google-Groups and Google-web
for anything new out there, but there is never very much.

Has anyone run into anything in their travels? (source code form the
unix/linux bunch, a whitepaper form some brainy University professor,
an author that wrote a few lines...)

I'm really not looking any recommendations for commercial products.

Thanks
Tom

Nov 13 '05 #2
Alan,

How do handle street address syntax, if all you’re doing is word
matching, or an I missing your point?

Tom

"Alan Webb" <kn*****@hotmail.com> wrote in message news:<e4********************@comcast.com>...
Tom,
We kept a database of common street names and address pieces by country.
Then we ran the address against our list looking for our standardized
version of whatever the address was. If it was a new variant spelling of a
listed address part the address was listed for review by our clerical staff.
Assuming we found all the parts of the address, our code then returned a
list of potential matches sorted by probability of a match. The last piece
of the puzzle was some code that said as long as the probability is within
an acceptable margin of error (+/- 5%) we would allow the code to change the
provided address to our standardized address. This way when we processed
addresses our clerical staff only had to deal with the small list of problem
addresses that the software could not clean up.

"Tom Warren" <tw*@gate.net> wrote in message
news:f4**************************@posting.google.c om...
About once a year or so for the last 10 years, I update my street
address parser and I'm starting to look at it again. This parser
splits a street address line into its smallest common elements
(number, trailer, pre, name, suffix, post, unit, unit id). I always
start this update process by searching Google-Groups and Google-web
for anything new out there, but there is never very much.

Has anyone run into anything in their travels? (source code form the
unix/linux bunch, a whitepaper form some brainy University professor,
an author that wrote a few lines...)

I'm really not looking any recommendations for commercial products.

Thanks
Tom

Nov 13 '05 #3
Also, does anyone have (lists of or) addresses that your parse doesn't
handle properly?

Tom
Nov 13 '05 #4
Tom,
We winged it. As a first pass we stored the entire string in a column and
its approved standardized version. After that it got ugly.
BTW--the USPS has its list of addresses for the US for sale on CD. Check
www.usps.gov for info.

"Tom Warren" <tw*@gate.net> wrote in message
news:f4**************************@posting.google.c om...
Also, does anyone have (lists of or) addresses that your parse doesn't
handle properly?

Tom

Nov 13 '05 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
by: Sugapablo | last post by:
I have a database table that contains street addresses in the following format: 123 Any St. 456 Some Rd. 7789 That Blvd. etc. I.e. Street number, street name, standard abbriviation of road...
1
by: Al Dykes | last post by:
Does anyone have an idea on how to find ZIP+4 codes? Thanks -- a d y k e s @ p a n i x . c o m Don't blame me. I voted for Gore.
2
by: wolftor | last post by:
Does anyone know how to create a query that will separate the street number from the street name? Eg. current address field = 14 Main Street, unit 4 but I want to get: streetno = 14...
1
by: Lumpierbritches | last post by:
Thank you in advance for any and all assistance, it is GREATLY appreciated. I was wondering if there is a way to tell Access 97 to compare the first line with other textboxes using the...
7
by: Raphi | last post by:
Hi, I'm trying to clean up a large database in Access. I have one field for address, which needs to be broken up into Street Number, Street Name, and Street Label (St., Road, etc.) The...
4
by: Robert Fitzpatrick | last post by:
Thanks to some help here on the list, I've been able to get addresses sorting pretty well, but now I have a issue with same addresses on different streets not grouping the streets. This is what I'm...
2
by: Zeya | last post by:
I need to find and replace US street address from HTML files in C# into something like this: street address , how can I do this using C# and Regular expression? Thanks.
2
by: razjafry | last post by:
Hi Gurus, I have street address fields as Addr1, Addr2, City, PCode, Prov, Country. Then I have a check box asking if the mailing address is same as the street address. How I can populate all the...
4
by: HowHow | last post by:
Using Access 2000. I need to group the address by suburb then by street name in my report. Two major problems here: 1. Unit number has the word "Unit" before number 2. Semi detach house has "a"...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.