The first thing you've got to do is figure out all of the possible
permutations of combinations of tokens that may comprise an "address." You
have only apparently noticed one or two. In fact, an "address" can take many
combinations of many forms, and include many combinations of abbreviations
of various kinds. In addition, the order of the elements (tokens) in an
address can be ordered in any number of ways, particularly if these
addresses come from different countries, and especially if these addresses
have been provided by human beings rather then machines.
IOW, you've opened up a huge can of worms for yourself. What you need is not
just a regular expression, but a bit of AI to solve this problem. I have
seen it done, but I'm not sure *how* it's done. MapPoint and Google Maps can
do it fairly well, but Microsoft and Google have a lot of money to throw at
this sort of problem.
--
HTH,
Kevin Spencer
Microsoft MVP
Professional Chicken Salad Alchemist
A lifetime is made up of
Lots of short moments.
<mi***************@gmail.com> wrote in message
news:11*********************@f6g2000cwb.googlegrou ps.com...
Hello all
have a regex question... I want to split an address into descrete parts
so
709 S Milton Ave is split into
number = 709
Direction = S
Name = Milton
Type = Ave
So I have the following regex
(?<number>^\d*(\s\w|\w|\-\w|\s\d/\d))\s(?<direction>(n\.|N\.|s\.|S\.|E\.|e\.|W\.|w\ .|NE\.|ne\.|SE\.|se\.|NW\.|nw\.|SW\.|sw\.|n|N|s|S| E|e|W|w|NE|ne|SE|se|NW|nw|SW|sw|North|East|West|So uth|north|south|west|east)*)(?<street>(.*[^street|place|drive|st|pl|dr|ave|av])*)(?<type>.*)
Which works for the folowing address
709 S S Milton ave (as in 709 S South Milton ave)
as that S is part of the number
but does not work for
709 S Milton ave
because it thinks that the S is part of the number and not the
direction....
any ideas