473,471 Members | 2,008 Online
Bytes | Software Development & Data Engineering Community
Create Post

Home Posts Topics Members FAQ

adress regex help

Hello all

have a regex question... I want to split an address into descrete parts

so

709 S Milton Ave is split into
number = 709
Direction = S
Name = Milton
Type = Ave

So I have the following regex

(?<number>^\d*(\s\w|\w|\-\w|\s\d/\d))\s(?<direction>(n\.|N\.|s\.|S\.|E\.|e\.|W\.|w\ .|NE\.|ne\.|SE\.|se\.|NW\.|nw\.|SW\.|sw\.|n|N|s|S| E|e|W|w|NE|ne|SE|se|NW|nw|SW|sw|North|East|West|So uth|north|south|west|east)*)(?<street>(.*[^street|place|drive|st|pl|dr|ave|av])*)(?<type>.*)

Which works for the folowing address

709 S S Milton ave (as in 709 S South Milton ave)

as that S is part of the number

but does not work for

709 S Milton ave
because it thinks that the S is part of the number and not the
direction....

any ideas

Jun 14 '06 #1
4 1551

<mi***************@gmail.com> wrote in message
news:11*********************@f6g2000cwb.googlegrou ps.com...
Hello all

have a regex question... I want to split an address into descrete parts

so

709 S Milton Ave is split into
number = 709
Direction = S
Name = Milton
Type = Ave

So I have the following regex

(?<number>^\d*(\s\w|\w|\-\w|\s\d/\d))\s(?<direction>(n\.|N\.|s\.|S\.|E\.|e\.|W\.|w\ .|NE\.|ne\.|SE\.|se\.|NW\.|nw\.|SW\.|sw\.|n|N|s|S| E|e|W|w|NE|ne|SE|se|NW|nw|SW|sw|North|East|West|So uth|north|south|west|east)*)(?<street>(.*[^street|place|drive|st|pl|dr|ave|av])*)(?<type>.*)

Which works for the folowing address

709 S S Milton ave (as in 709 S South Milton ave)

as that S is part of the number

but does not work for

709 S Milton ave
because it thinks that the S is part of the number and not the
direction....
Without having a database to find out whether the city has a "South Milton
Avenue", it's ambiguous. Why isn't number "709 S" on "Milton Ave" as valid
as number "709" on "S Milton Ave".

Moreover, your regex is going to go crazy over
P.O. Box 6000

any ideas

Jun 14 '06 #2
The first thing you've got to do is figure out all of the possible
permutations of combinations of tokens that may comprise an "address." You
have only apparently noticed one or two. In fact, an "address" can take many
combinations of many forms, and include many combinations of abbreviations
of various kinds. In addition, the order of the elements (tokens) in an
address can be ordered in any number of ways, particularly if these
addresses come from different countries, and especially if these addresses
have been provided by human beings rather then machines.

IOW, you've opened up a huge can of worms for yourself. What you need is not
just a regular expression, but a bit of AI to solve this problem. I have
seen it done, but I'm not sure *how* it's done. MapPoint and Google Maps can
do it fairly well, but Microsoft and Google have a lot of money to throw at
this sort of problem.

--
HTH,

Kevin Spencer
Microsoft MVP
Professional Chicken Salad Alchemist

A lifetime is made up of
Lots of short moments.

<mi***************@gmail.com> wrote in message
news:11*********************@f6g2000cwb.googlegrou ps.com...
Hello all

have a regex question... I want to split an address into descrete parts

so

709 S Milton Ave is split into
number = 709
Direction = S
Name = Milton
Type = Ave

So I have the following regex

(?<number>^\d*(\s\w|\w|\-\w|\s\d/\d))\s(?<direction>(n\.|N\.|s\.|S\.|E\.|e\.|W\.|w\ .|NE\.|ne\.|SE\.|se\.|NW\.|nw\.|SW\.|sw\.|n|N|s|S| E|e|W|w|NE|ne|SE|se|NW|nw|SW|sw|North|East|West|So uth|north|south|west|east)*)(?<street>(.*[^street|place|drive|st|pl|dr|ave|av])*)(?<type>.*)

Which works for the folowing address

709 S S Milton ave (as in 709 S South Milton ave)

as that S is part of the number

but does not work for

709 S Milton ave
because it thinks that the S is part of the number and not the
direction....

any ideas

Jun 14 '06 #3
Thanks guys... couple reasponses....

1) 709 S | Milton Ave is not as valid as 709 | S | Milton ave because
they want the direction seperate... 709 S is not the street number 709
is and S Milton is not the street milton is.

2) Kevin, yah what I was suspecting but not wanting to think about.
Alternative for the client is to have 4 seperate fields on the ui
[number] [direction] [street] [type] .... but I hate this as that its
not intuitive.... or web standard.

thanks for your input guys

mike

Kevin Spencer wrote:
The first thing you've got to do is figure out all of the possible
permutations of combinations of tokens that may comprise an "address." You
have only apparently noticed one or two. In fact, an "address" can take many
combinations of many forms, and include many combinations of abbreviations
of various kinds. In addition, the order of the elements (tokens) in an
address can be ordered in any number of ways, particularly if these
addresses come from different countries, and especially if these addresses
have been provided by human beings rather then machines.

IOW, you've opened up a huge can of worms for yourself. What you need is not
just a regular expression, but a bit of AI to solve this problem. I have
seen it done, but I'm not sure *how* it's done. MapPoint and Google Maps can
do it fairly well, but Microsoft and Google have a lot of money to throw at
this sort of problem.

--
HTH,

Kevin Spencer
Microsoft MVP
Professional Chicken Salad Alchemist

A lifetime is made up of
Lots of short moments.

<mi***************@gmail.com> wrote in message
news:11*********************@f6g2000cwb.googlegrou ps.com...
Hello all

have a regex question... I want to split an address into descrete parts

so

709 S Milton Ave is split into
number = 709
Direction = S
Name = Milton
Type = Ave

So I have the following regex

(?<number>^\d*(\s\w|\w|\-\w|\s\d/\d))\s(?<direction>(n\.|N\.|s\.|S\.|E\.|e\.|W\.|w\ .|NE\.|ne\.|SE\.|se\.|NW\.|nw\.|SW\.|sw\.|n|N|s|S| E|e|W|w|NE|ne|SE|se|NW|nw|SW|sw|North|East|West|So uth|north|south|west|east)*)(?<street>(.*[^street|place|drive|st|pl|dr|ave|av])*)(?<type>.*)

Which works for the folowing address

709 S S Milton ave (as in 709 S South Milton ave)

as that S is part of the number

but does not work for

709 S Milton ave
because it thinks that the S is part of the number and not the
direction....

any ideas


Jun 15 '06 #4
Keep in mind that addresses don't always follow that (or any similar)
format. Here are a few examples:

John Smith
Smith Enterprises
P.O. Box 12345
Anytown, Nebraska
00000

Jack and Jill Hill
RR 5 Box 909
Podunk, WI 12345-7890

MR S HOLMES
2978 W MAIN ST # 12
MINNEAPOLIS MN 23976-4542

May December
Bowers Holiday Village
Bldg 91 Apt. 2-A
12 31st Street
Baltimore, Maryland
79797
USA

Herrn
Günther Meyer
Goethestraße 25
20002 HAMBURG
Federal Republic of Germany

SGT NICK FURY
HEADQUARTERS COMPANY
7TH ARMY TRAINING CENTER
ATTN: AETT-AG
UNIT 28130
APO AE 09114

CUSTOMS ATTACHE
AMERICAN EMBASSY CARACAS
UNIT 4964
APO AA 34037

MS HELEN SAUNDERS
1010 CLEAR STREET
OTTAWA ON K1A 0B1
CANADA

MS JOYCE BROWNING
2045 ROYAL ROAD
06570 ST PAUL
FRANCE

MS JOYCE BROWNING
2045 ROYAL ROAD
LONDON WIP 6HQ
ENGLAND

RUFUS LANGDON
LAW DEPARTMENT
US POSTAL SERVICE
475 L'ENFANT PLZ SW RM 6627
WASHINGTON DC 202360-1120

I have found a few references for you. However, again, this is a huge task.
There is commercial software out there that you can buy to do this sort of
parsing. Just Google for it. Here are some links to references:

http://www.columbia.edu/kermit/postal.html
http://pe.usps.com/text/pub28/welcome.htm
http://www.grcdi.nl/whitepapers.htm
http://aurora.regenstrief.org/v3dt/PAS.html
http://www.cicc.or.jp/english/hyoujy...tabook/219.htm

Good luck!

--
HTH,

Kevin Spencer
Microsoft MVP
Professional Chicken Salad Alchemist

A lifetime is made up of
Lots of short moments.
<mi***************@gmail.com> wrote in message
news:11**********************@f6g2000cwb.googlegro ups.com...
Thanks guys... couple reasponses....

1) 709 S | Milton Ave is not as valid as 709 | S | Milton ave because
they want the direction seperate... 709 S is not the street number 709
is and S Milton is not the street milton is.

2) Kevin, yah what I was suspecting but not wanting to think about.
Alternative for the client is to have 4 seperate fields on the ui
[number] [direction] [street] [type] .... but I hate this as that its
not intuitive.... or web standard.

thanks for your input guys

mike

Kevin Spencer wrote:
The first thing you've got to do is figure out all of the possible
permutations of combinations of tokens that may comprise an "address."
You
have only apparently noticed one or two. In fact, an "address" can take
many
combinations of many forms, and include many combinations of
abbreviations
of various kinds. In addition, the order of the elements (tokens) in an
address can be ordered in any number of ways, particularly if these
addresses come from different countries, and especially if these
addresses
have been provided by human beings rather then machines.

IOW, you've opened up a huge can of worms for yourself. What you need is
not
just a regular expression, but a bit of AI to solve this problem. I have
seen it done, but I'm not sure *how* it's done. MapPoint and Google Maps
can
do it fairly well, but Microsoft and Google have a lot of money to throw
at
this sort of problem.

--
HTH,

Kevin Spencer
Microsoft MVP
Professional Chicken Salad Alchemist

A lifetime is made up of
Lots of short moments.

<mi***************@gmail.com> wrote in message
news:11*********************@f6g2000cwb.googlegrou ps.com...
> Hello all
>
> have a regex question... I want to split an address into descrete parts
>
> so
>
> 709 S Milton Ave is split into
> number = 709
> Direction = S
> Name = Milton
> Type = Ave
>
> So I have the following regex
>
> (?<number>^\d*(\s\w|\w|\-\w|\s\d/\d))\s(?<direction>(n\.|N\.|s\.|S\.|E\.|e\.|W\.|w\ .|NE\.|ne\.|SE\.|se\.|NW\.|nw\.|SW\.|sw\.|n|N|s|S| E|e|W|w|NE|ne|SE|se|NW|nw|SW|sw|North|East|West|So uth|north|south|west|east)*)(?<street>(.*[^street|place|drive|st|pl|dr|ave|av])*)(?<type>.*)
>
> Which works for the folowing address
>
> 709 S S Milton ave (as in 709 S South Milton ave)
>
> as that S is part of the number
>
> but does not work for
>
> 709 S Milton ave
> because it thinks that the S is part of the number and not the
> direction....
>
> any ideas
>

Jun 15 '06 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

5
by: FKothe | last post by:
Hello together, the program below shows a behavior i do not understand. When compiled with the HX-UX11 c-comiler ( version B.11.11.04 ) v2.p in function test_it0 points to an invalid adress and...
0
by: Jarod_24 | last post by:
I got a function that return the url of a image, the problem is that the image control on the .aspx page automaticaly adds the adress of itself to the adress Example: The page...
3
by: Jarod_24 | last post by:
I got a function that return the url of a image, the problem is that the image control on the .aspx page automaticaly adds the adress of itself to the adress Example: The page...
17
by: clintonG | last post by:
I'm using an .aspx tool I found at but as nice as the interface is I think I need to consider using others. Some can generate C# I understand. Your preferences please... <%= Clinton Gallagher ...
2
by: Sven Wynsberghe | last post by:
Like the topic states, is there any way I can get the ip-adress of my computer with a vb.net application? thanks for the help :)
9
by: jmchadha | last post by:
I have got the following html: "something in html ... etc.. city1... etc... <a class="font1" href="city1.html" onclick="etc."click for <b>info</bon city1 </a> ... some html. city1.. can repeat...
11
by: mwebel | last post by:
Hi, i had this problem before (posted here and solved it then) now i have the same problem but more complicated and general... basically i want to store the adress of a istream in a char* among...
3
Sagittarius
by: Sagittarius | last post by:
Hi there. I have a problem concerning an UDP socket in C++ (Winsock). The next paragraphs is merely to explain the system I am working on. If U want to skip it, I have marked the question in...
1
by: TOUNSI | last post by:
hi at every body i've this code that can connect with Active Directory <body> <?php $user = "Administrateur@essaie.du";// mon nom d utilisateur c'est la convetion de active Directory d'utiliser...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
1
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...
0
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The...
0
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
0
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated ...
0
muto222
php
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.