473,587 Members | 2,509 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Updating Street Address Parser

About once a year or so for the last 10 years, I update my street
address parser and I'm starting to look at it again. This parser
splits a street address line into its smallest common elements
(number, trailer, pre, name, suffix, post, unit, unit id). I always
start this update process by searching Google-Groups and Google-web
for anything new out there, but there is never very much.

Has anyone run into anything in their travels? (source code form the
unix/linux bunch, a whitepaper form some brainy University professor,
an author that wrote a few lines...)

I'm really not looking any recommendations for commercial products.

Thanks
Tom
Nov 13 '05 #1
4 5503
Tom,
We kept a database of common street names and address pieces by country.
Then we ran the address against our list looking for our standardized
version of whatever the address was. If it was a new variant spelling of a
listed address part the address was listed for review by our clerical staff.
Assuming we found all the parts of the address, our code then returned a
list of potential matches sorted by probability of a match. The last piece
of the puzzle was some code that said as long as the probability is within
an acceptable margin of error (+/- 5%) we would allow the code to change the
provided address to our standardized address. This way when we processed
addresses our clerical staff only had to deal with the small list of problem
addresses that the software could not clean up.

"Tom Warren" <tw*@gate.net > wrote in message
news:f4******** *************** ***@posting.goo gle.com...
About once a year or so for the last 10 years, I update my street
address parser and I'm starting to look at it again. This parser
splits a street address line into its smallest common elements
(number, trailer, pre, name, suffix, post, unit, unit id). I always
start this update process by searching Google-Groups and Google-web
for anything new out there, but there is never very much.

Has anyone run into anything in their travels? (source code form the
unix/linux bunch, a whitepaper form some brainy University professor,
an author that wrote a few lines...)

I'm really not looking any recommendations for commercial products.

Thanks
Tom

Nov 13 '05 #2
Alan,

How do handle street address syntax, if all you’re doing is word
matching, or an I missing your point?

Tom

"Alan Webb" <kn*****@hotmai l.com> wrote in message news:<e4******* *************@c omcast.com>...
Tom,
We kept a database of common street names and address pieces by country.
Then we ran the address against our list looking for our standardized
version of whatever the address was. If it was a new variant spelling of a
listed address part the address was listed for review by our clerical staff.
Assuming we found all the parts of the address, our code then returned a
list of potential matches sorted by probability of a match. The last piece
of the puzzle was some code that said as long as the probability is within
an acceptable margin of error (+/- 5%) we would allow the code to change the
provided address to our standardized address. This way when we processed
addresses our clerical staff only had to deal with the small list of problem
addresses that the software could not clean up.

"Tom Warren" <tw*@gate.net > wrote in message
news:f4******** *************** ***@posting.goo gle.com...
About once a year or so for the last 10 years, I update my street
address parser and I'm starting to look at it again. This parser
splits a street address line into its smallest common elements
(number, trailer, pre, name, suffix, post, unit, unit id). I always
start this update process by searching Google-Groups and Google-web
for anything new out there, but there is never very much.

Has anyone run into anything in their travels? (source code form the
unix/linux bunch, a whitepaper form some brainy University professor,
an author that wrote a few lines...)

I'm really not looking any recommendations for commercial products.

Thanks
Tom

Nov 13 '05 #3
Also, does anyone have (lists of or) addresses that your parse doesn't
handle properly?

Tom
Nov 13 '05 #4
Tom,
We winged it. As a first pass we stored the entire string in a column and
its approved standardized version. After that it got ugly.
BTW--the USPS has its list of addresses for the US for sale on CD. Check
www.usps.gov for info.

"Tom Warren" <tw*@gate.net > wrote in message
news:f4******** *************** ***@posting.goo gle.com...
Also, does anyone have (lists of or) addresses that your parse doesn't
handle properly?

Tom

Nov 13 '05 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
4610
by: Sugapablo | last post by:
I have a database table that contains street addresses in the following format: 123 Any St. 456 Some Rd. 7789 That Blvd. etc. I.e. Street number, street name, standard abbriviation of road type.
1
2250
by: Al Dykes | last post by:
Does anyone have an idea on how to find ZIP+4 codes? Thanks -- a d y k e s @ p a n i x . c o m Don't blame me. I voted for Gore.
2
6189
by: wolftor | last post by:
Does anyone know how to create a query that will separate the street number from the street name? Eg. current address field = 14 Main Street, unit 4 but I want to get: streetno = 14 streetname = Main Street
1
2552
by: Lumpierbritches | last post by:
Thank you in advance for any and all assistance, it is GREATLY appreciated. I was wondering if there is a way to tell Access 97 to compare the first line with other textboxes using the "Trim(StrConv" and joining two textboxes into one then adding it, ONLY if that string IS NOT matched in another textbox? Example: Trim(StrConv(LastName,1))& " " &Trim(StrConv(FirstName,1)) then adding this to the address textbox, only if line one evaluated...
7
2985
by: Raphi | last post by:
Hi, I'm trying to clean up a large database in Access. I have one field for address, which needs to be broken up into Street Number, Street Name, and Street Label (St., Road, etc.) The problem is that the data is very dirty. So some addresses will be standard "456 XYZ Road," while others won't have a number and will just say "XYZ Industrial Park," meaning I can't just use Instr to search for the first space because sometimes the...
4
3773
by: Robert Fitzpatrick | last post by:
Thanks to some help here on the list, I've been able to get addresses sorting pretty well, but now I have a issue with same addresses on different streets not grouping the streets. This is what I'm using a substring search in the ORDER BY statement now like in this view: SELECT tblhudsimilargroups.rems_id, tblhudsimilargroups.group_id, tblhudsimilargroups.similar_group_id, tblhudbuildings.address, tblhudbuildings.hud_building_id,...
2
3173
by: Zeya | last post by:
I need to find and replace US street address from HTML files in C# into something like this: street address , how can I do this using C# and Regular expression? Thanks.
2
2475
by: razjafry | last post by:
Hi Gurus, I have street address fields as Addr1, Addr2, City, PCode, Prov, Country. Then I have a check box asking if the mailing address is same as the street address. How I can populate all the corresponding mailing address fields automatically if someone checks the check box as Yes. I use mailing address fields as MailAddr1, MailAddr2, MailCity, MailPCode, MailProv, MailCountry. I really appreciate all the help. Thanks, Syed
4
2917
by: HowHow | last post by:
Using Access 2000. I need to group the address by suburb then by street name in my report. Two major problems here: 1. Unit number has the word "Unit" before number 2. Semi detach house has "a" or "b" after number Examples of the addresses as below. Address 1 means the first line/field for users to key address in line one, Address 2 means second line/field two for users to key address, normally village, park or street name eg.1,...
0
7923
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
7852
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
8216
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
1
7974
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
8221
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
6629
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
5719
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
3845
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
3882
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.