469,356 Members | 2,016 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,356 developers. It's quick & easy.

Matching field by 5 first characters

Hello,
I am trying to match tables into a query based on 5 first characters in one field. I used expression: =left([field name]), 5) however it does not return matched entries as I needed. Basically, I need to tell query match two fields based on first characters and and show all records from Table 1 and only those records from table 2 where first 5 letters of the fields are matching.

Here is my SQL from the query:

Expand|Select|Wrap|Line Numbers
  1. SELECT [Master - Account by SVP Consolidated].[Scode Description], Master_Missing_Fields.Account, [Master - Account by SVP Consolidated].[Ops SVP], [Master - Account by SVP Consolidated].[AM SVP (Global)]
  2. FROM Master_Missing_Fields LEFT JOIN [Master - Account by SVP Consolidated] ON (Master_Missing_Fields.Account =left([Master - Account by SVP Consolidated].[Scode Description], 5));
Mar 15 '11 #1
21 11256
TheSmileyCoder
2,321 Expert Mod 2GB
From what I can tell the SQL looks good enough. I tried setting up a few test tables and used a join like yours without problems.

When you say "however it does not return matched entries as I needed." what exactly do you mean? Try to be more speficic.
Mar 15 '11 #2
The issue I now have is that the fields that I am matching may contain characters that I need not only at the left but in the middle and end. I tried to use =right, and =mid expressions, but is there a better way to match one field to another by any part of the field? Let say I have :

Catwalk to match to Cat
And
Cat Walk to match to Cat
And
Ca twalk to match to Cat?

Makes sense?
Mar 16 '11 #3
NeoPa
32,185 Expert Mod 16PB
So, essentially you want SQL to recognise records as matched when the values in them don't really match, but a human might be able to recognise similarities between the two values. This sounds like the bones of a doctoral thesis.

I'm sorry. I think you're out of luck on this one. You could get somewhere with specific checks to handle specific situations, but nothing as general as you stipulate I'm afraid.
Mar 16 '11 #4
ADezii
8,800 Expert 8TB
@Yoni - I'm sure that I can come up with a strictly code-based solution, but I'm still a little hazy as to the exact nature of the request. Can you post some sample, accurate, data as well as what the results should be?
Mar 16 '11 #5
I suppose I was not clear. I would like to have a most closest match based on the data in fields. The match can contain 3-5 characters that can appear at the beginning or in the middle. I saw some posts on another site where people say it was possible by perhaps combining different expressions but I was not successful to implement them. Are there options to use WHERE vs Like in a query? Thanks
Mar 16 '11 #6
ADezii
8,800 Expert 8TB
Again, kindly post some sample Data along with expected Results, so that your request is crystal clear.
Mar 16 '11 #7
My apologies.
I attached both tables:

Table 1 (SVP_Account_Name)
Table2 (Account_Name_Incomplete)

I need to match Scode Description field from Table1 with Acct Name in Table2 so it can attach appropriate Ops SVP within the query. You will notice that Account Names in Table2 vary and may appear as exact or partial match based on one of the words in a name string. I guess may be "best possible match" would be a good solution for me?

Thank you in advance !
Attached Files
File Type: pdf TesterTable.pdf (8.0 KB, 422 views)
Mar 16 '11 #8
Rabbit
12,516 Expert Mod 8TB
Are you seriously trying to match up SCode to Acct Name? Why are these fields free text fields? You're going to have a hard time linking them up. We are talking Google level matching algorithms here.

I assume FacilitiesEast matches to East Facilities. But what does Facilities match to? I see two possibilities in there. And who knows what other weird matches you're going to need? I mean, I'm not even sure which one North America is supposed to match up to. It's probably Global North America but there's always the possibility the user meant NA Northeast Admin.

You won't be able to use a phoenetic algorithm. Your best bet is probably an n-gram technique, possibly a 2-gram at the word level. But if that's not sensitive enough then you'll need to go down to the character level.

Really, the solution is not a stopgap matching algorithm. You need to fix the data.
Mar 16 '11 #9
Hi. Agree, the data is very "fuzzy" since we inhereted this database. I am trying to at least get closest match between the two tables and "assign" Ops SVP and still show all non-matching records as well...
Mar 16 '11 #10
Rabbit
12,516 Expert Mod 8TB
What's the size of the dataset? If it's imperative to match all records, your only sure fire method is manually going through and matching it.

If you're just trying to match as many as possible, then there are lots of algorithms. You could use a simple one in which you check on a word by word basis. This wouldn't give many positive matches.

If you want as many positive matches as you can, I think your best bet is an n-gram algorithm. But beware that it's a fairly advanced algorithm and would require a lot of work to implement correctly. Unless, of course, you could find an existing library.
Mar 16 '11 #11
ADezii
8,800 Expert 8TB
@Yoni - What if the following Scenario comprised a Match (<==>):
  1. [Scode Description] and [Acct Name] are equal:
    1. Eliminations <==> Eliminations
    2. DBank <==> Dbank
    3. East Facilities <==> East Facilities
    4. Global North America <==> Global North America
  2. Greater than 50% of the Components in [Scode Description] are also contained in [Acct Name] in some manner:
    1. East Facilities <==> FacilitiesEast (both East and Facilites are contained in FacilitiesEast (100%), but Facilities would not be a Match since only Facilities in East Facilities in contained within (50%).
    2. Southfield Facilities ==> No Match in [Acct Name], since both Components are not contained within the [Acct Name].
    3. Global North America <==> North America (both North and America contained in [Acct Name] 66.6%.
P.S. - I do realize that this approach is rather simplistic, but the only other alternative as I see it is what Rabbit pointed out, more involved Algorithms. Matches could be written to a Results Table consisting of [Scode Description], [Acct Name], and [Ops SVP].
Mar 16 '11 #12
I think your second approach may work for my purpose. Would you by chance have a code that I can analyze and possibly use for this query?
Mar 16 '11 #13
ADezii
8,800 Expert 8TB
@Yoni - To whom are you referring to, Rabbit or ADezii, in your last Post?
Mar 16 '11 #14
Rabbit
12,516 Expert Mod 8TB
I think he's talking to you ADezii. He must have mistook your #2 as an alternate approach rather than a complementary approach.

Yoni, since the fields are free form, what do you intend to do about mispellings, typos, abbreviations, etc?
Mar 16 '11 #15
Thanks to all. I was replying to ADezii.
Mar 16 '11 #16
ADezii
8,800 Expert 8TB
@Yoni - I'll come up with something, but it will be by no means comprehensive. To be honest, I do not think that there is a comprehensive solution to your rather unique situation. Give me some time, and I'll see what I can come up with. The results will be displayed in an independent Table, and not in a Query per say, since I do not see an easy Method to incorporate the Logic into a Query. Stay tuned...
Mar 16 '11 #17
ADezii
8,800 Expert 8TB
@Yoni - Here is an Algorithm that I created based on what I previously stated in Post #12. I'll post the Code, which is well commented, as well as the Demo Database used for this Thread. Should you have any questions, please feel free to ask. You can make the Results more or less Restrictive by changing the Value of the conPERCENTAGE Constant, as well as the Relational Operator that proceeds it in Code Line #43.
Expand|Select|Wrap|Line Numbers
  1. Dim MyDB As DAO.Database
  2. Dim rst_1 As DAO.Recordset              'Table SVP_Account_Name
  3. Dim rst_2 As DAO.Recordset              'Table Account_Name_Incomplete
  4. Dim rstResults As DAO.Recordset         'Results Table
  5. Dim varRet                              'Variant Array to hold Elements of [Scode Description]
  6. Dim intCtr As Integer                   'Used to Loop thru Elements of [Scode Description]
  7. Dim intNumOfMatches As Integer          'Number of Matches (Elements in [Scode Description]
  8. Dim intOverallCtr As Integer            'Overall Counter (Number of Elements in [Scode Description])
  9. Const conPERCENTAGE As Single = 0.5     '> this Value on [Scode Description] Elements indicates Match
  10.  
  11. 'Clear the Results Table
  12. CurrentDb.Execute "DELETE * FROM tblResults", dbFailOnError
  13.  
  14. Set MyDB = CurrentDb()
  15. Set rst_1 = MyDB.OpenRecordset("SVP_Account_Name", dbOpenForwardOnly)
  16. Set rst_2 = MyDB.OpenRecordset("Account_Name_Incomplete", dbOpenSnapshot)
  17. Set rstResults = MyDB.OpenRecordset("tblResults", dbOpenDynaset, dbAppendOnly)
  18.  
  19. 'Match every Record ([Scode Description]] in SVP_Account_Name against every
  20. 'Record ([Acct Name])Record in Account_Name_Incomplete for any Partial Matches
  21. With rst_1
  22.   Do While Not .EOF
  23.     Do While Not rst_2.EOF
  24.       'Check for exact Matches first (Case-insensitive)
  25.       If ![Scode Description] = rst_2![Acct Name] Then
  26.         rstResults.AddNew
  27.           rstResults![scode] = ![Scode Description]
  28.           rstResults![Acct Name] = rst_2![Acct Name]
  29.           rstResults![Ops SVP] = ![Ops SVP]
  30.         rstResults.Update
  31.         'Debug.Print ![Scode Description], rst_2![Acct Name], ![Ops SVP]
  32.       Else      'No Match, parse [Scode Description], and see if > 50% of its
  33.                 'Elements are contained within [Acct Name]. In a prior Post, yoou
  34.                 'indicated that Spaces should be ignored, so let's account for that also
  35.         varRet = Split(![Scode Description], " ")
  36.           For intCtr = LBound(varRet) To UBound(varRet)
  37.             intOverallCtr = intOverallCtr + 1
  38.               If InStr(Replace(rst_2![Acct Name], " ", ""), varRet(intCtr)) > 0 Then
  39.                 intNumOfMatches = intNumOfMatches + 1
  40.               End If
  41.           Next
  42.             'Do > 50% of Elements in [Scode Description] match [Acct Name]
  43.             If (intNumOfMatches / intOverallCtr) > conPERCENTAGE Then    'Percentage Criteria for a 'MATCH'
  44.               rstResults.AddNew
  45.                 rstResults![scode] = ![Scode Description]
  46.                 rstResults![Acct Name] = rst_2![Acct Name]
  47.                 rstResults![Ops SVP] = ![Ops SVP]
  48.               rstResults.Update
  49.               'Debug.Print ![Scode Description], rst_2![Acct Name], ![Ops SVP]
  50.             End If
  51.             intNumOfMatches = 0: intOverallCtr = 0      'RESET, critical
  52.       End If
  53.         rst_2.MoveNext
  54.     Loop
  55.         rst_2.MoveFirst
  56.        .MoveNext
  57.   Loop
  58. End With
  59.  
  60. rst_1.Close
  61. rst_2.Close
  62. rstResults.Close
  63. Set rst_1 = Nothing
  64. Set rst_2 = Nothing
  65. Set rstResults = Nothing
  66.  
  67. 'Let's see the Results
  68. DoCmd.OpenTable "tblResults", acViewNormal, acReadOnly
  69. DoCmd.Maximize
Results Table (tblResults) from Data submitted in Post #9 via Attachment, plus one Test Record indicating Space elimination:
Expand|Select|Wrap|Line Numbers
  1. SCode                    Acct Name                 Ops SVP
  2. DBank                    Dbank                     Teri Muller
  3. East Facilities          East Facilities           Kathy Wu
  4. East Facilities          FacilitiesEast            Kathy Wu
  5. Eliminations             Eliminations              Jorge Carr
  6. Global North America     Global North America      Betty Dunn
  7. Global North America     North America             Betty Dunn
  8. Help Me Rhonda           Rh onda help              ADezii
  9.  
Attached Files
File Type: zip Find Matches.zip (19.1 KB, 160 views)
Mar 16 '11 #18
Rabbit
12,516 Expert Mod 8TB
Just a suggestion but you could generalize the function to the form:
Expand|Select|Wrap|Line Numbers
  1. Function matchPercentage(str1 As String, str2 As String) As Double
  2.    ...
  3.    matchPercentage = numMatches / numTerms
  4. End Function
Then you can call it from a SQL query.
Mar 16 '11 #19
ADezii
8,800 Expert 8TB
Thanks Rabbit for the suggestion. Actually, the Code is too long to be integrated into a Single Procedure anyway. I intentionally kept it in-line for clarification reasons, but in hindsight, it should probably have been splintered into two or more routines. I know that the Code is a little rogue, OK a lot rogue, but what is your honest opinion on this particular approach?
Mar 17 '11 #20
Rabbit
12,516 Expert Mod 8TB
It's a good approach. In essence, it is a simplified and specialized 1-gram technique at the word level. However, it is susceptible to spelling errors.

To take it further, what you could do is do a 2-gram at the letter level. For example, the 2-grams for the phrase "hi bob" is "$h", "hi", "i ", " b", "bo", "ob", "b#" where $ is the start of the string and # is the end of the string. Once you have all the distinct 2-grams, you do a frequency count. Then, when you want to see if another string is a match, you get the frequency count of its 2-grams and count how many 2-grams in one string match the 2-grams in the other string. This allows you to fudge the spelling a little bit depending on how high you want to set the threshold. The beauty of the n-gram method is that it can be indexed even though it can create giant indexes.

Also, I don't think it would be too difficult to port your function. Something like this could work. The code is off the top of my head so it may not work with a simple copy and paste.

Expand|Select|Wrap|Line Numbers
  1. Function FuzzyMatch(arrWords() As String, strToMatch As String) As Long
  2.    Dim intMatches As Integer
  3.    Dim i As Integer
  4.  
  5.    intMatches = 0
  6.    For i = LBound(arrWords) To UBound(arrWords)
  7.       If InStr(strToMatch, arrWords(i)) > 0 Then
  8.          intMatches = intMatches + 1
  9.       End If
  10.    Next
  11.  
  12.    FuzzyMatch = intMatches / Len(arrWords)
  13. End Function
Then you could call it in SQL like so
Expand|Select|Wrap|Line Numbers
  1. SELECT *
  2. FROM Table1
  3. WHERE FuzzyMatch(Split(Field1, " "), Field2) > 0.5
Mar 17 '11 #21
TheSmileyCoder
2,321 Expert Mod 2GB
The approach and method discussion is interesting in itself, but just out of curiosity, are you (Yoni) expecting to do this just this once time and clean up your data?

How many records are we talking about in total?
Mar 17 '11 #22

Post your reply

Sign in to post your reply or Sign up for a free account.

Similar topics

9 posts views Thread by Henrootje | last post: by
1 post views Thread by CARIGAR | last post: by
reply views Thread by zhoujie | last post: by
reply views Thread by suresh191 | last post: by
1 post views Thread by Marylou17 | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.