HI, NeoPa, I was wondering if you gave it a thought about my post, I know it is been awhile, but I'm trying this again, thanks in advance
Some serious consideration and rules would have to go into something like this. I am not a native of Peru so am not familiar with any rules that might be available for well formatting of addresses over there.
Normalising address data from denormalised data which ever way you look at it would have to follow a set of rules in order to be successful. When examining a 'string' of data like that 'any' set based logic would programatically need to know when to parse and when to wrap and when to add carriage return or line break characters and so on. You have already identified the necessity of it by the requirement of the posting.
This can be beset with problems of course, unless you have a mature set of rules to work by because given the computer will do exactly what it is told how would it know for instance 'when' to concatenate one word to another
AND in the proper context or conversely leave the currently examined word alone and proceed onto the next word and examine that until you get to the end of the string. This can be achieved eventually but invariably is only done so when all the possible permutations for anomolies that 'might' occur have been listed so that comparitively speaking any data can be adjusted on the fly to suit any output display
I am sure anyone viewing the examples posted will appreciate the mechanics of what I speak of. The startpoint for this it seems to me is to break what you have into its key elements namely separate words firstly parsing them out to see in actual fact
in what context they can be put back together successfully again in order to normalise to properly structured columns that contain the 'correct' type of data.
Now this is not the answer but what follows is at least an illustration of what I am eluding to. You will need to replicate this your end in order to see what I mean.
1) Create a table called tblAddress with a single field called 'Address' datasize text 100
2) Populate it with the example data you posted
3) Create these functions in a standard module
- Function CountWords(s) As Integer
-
'
-
' Counts words in a string separated by 1 or more spaces
-
'
-
Dim WC As Integer, i As Integer, OnASpace As Integer
-
If VarType(s) <> 8 Or Len(Trim(s)) = 0 Then
-
CountWords = 0
-
Exit Function
-
End If
-
WC = 0
-
OnASpace = True
-
For i = 1 To Len(s)
-
If Mid(s, i, 1) = " " Then
-
OnASpace = True
-
Else
-
If OnASpace Then
-
OnASpace = False
-
WC = WC + 1
-
End If
-
End If
-
Next i
-
CountWords = WC
-
End Function
-
-
Function GetWord(s, Indx As Integer)
-
'
-
' Extracts a word in text where words are separated by 1 or more spaces
-
'
-
Dim i As Integer, WC As Integer, Count As Integer, SPos As Integer, EPos As Integer, OnASpace As Integer
-
WC = CountWords(s)
-
If Indx < 1 Or Indx > WC Then
-
GetWord = Null
-
Exit Function
-
End If
-
Count = 0
-
OnASpace = True
-
For i = 1 To Len(s)
-
If Mid(s, i, 1) = " " Then
-
OnASpace = True
-
Else
-
If OnASpace Then
-
OnASpace = False
-
Count = Count + 1
-
If Count = Indx Then
-
SPos = i
-
Exit For
-
End If
-
End If
-
End If
-
Next i
-
EPos = InStr(SPos, s, " ") - 1
-
If EPos <= 0 Then EPos = Len(s)
-
GetWord = Mid(s, SPos, EPos - SPos + 1)
-
End Function
-
4) Create a new query and in the SQL window paste the following SQL and save the query as qryStringManipulation
-
-
SELECT tblAddressTest.Address, CountSWords([Address],Space(1)) AS WordCount, GetWord([Address],1) AS Word1, GetWord([Address],2) AS Word2, GetWord([Address],3) AS Word3, GetWord([Address],4) AS Word4, GetWord([Address],5) AS Word5, GetWord([Address],6) AS Word6, GetWord([Address],7) AS Word7
-
FROM tblAddressTest;
-
5) Run the query.
You will see a matrix display of the addresses as posted to the right of which will be a count of the words in each address and to the right of that seven columns of data with one word in each.
Now the logical question is this!
Which words can be brought back together to identify the physical individual entity aspects properly.? ie: can data contained in column (word2) be treated as house number? ...no it has both numeric data (the figure 10) below which is a combination of numeric and alpha (53-B) and below that the word 'Gran' Now we could invent all sorts of logic per row to say things like if word2 is numeric then combine with word1 to become the house number and so on but this is sophistication that only your data can determine what route you are obliged to take.
IMHO opinion it would be indeed by a superstar who could give you a solution that 'practically' solves this for you taking into account all possible combinations of all of your live data but reponding in a manner in this thread to only three lines of data as an example on which to base a solution. (
data cleansing software is commercially available, big bucks too..... as are
superstars, but I hasten to humble myself to not being one of them :)))
I know this is not much help and your table lookup is a direction that you are considering but even that will be working around the same logic pattern I have described here so the point remains the same in effect.
I hope this helps you to
some extent or at least gives you a different angle to consider!
Regards
Jim :)