473,322 Members | 1,473 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,322 software developers and data experts.

Matching field by 5 first characters

Hello,
I am trying to match tables into a query based on 5 first characters in one field. I used expression: =left([field name]), 5) however it does not return matched entries as I needed. Basically, I need to tell query match two fields based on first characters and and show all records from Table 1 and only those records from table 2 where first 5 letters of the fields are matching.

Here is my SQL from the query:

Expand|Select|Wrap|Line Numbers
  1. SELECT [Master - Account by SVP Consolidated].[Scode Description], Master_Missing_Fields.Account, [Master - Account by SVP Consolidated].[Ops SVP], [Master - Account by SVP Consolidated].[AM SVP (Global)]
  2. FROM Master_Missing_Fields LEFT JOIN [Master - Account by SVP Consolidated] ON (Master_Missing_Fields.Account =left([Master - Account by SVP Consolidated].[Scode Description], 5));
Mar 15 '11 #1
21 11920
TheSmileyCoder
2,322 Expert Mod 2GB
From what I can tell the SQL looks good enough. I tried setting up a few test tables and used a join like yours without problems.

When you say "however it does not return matched entries as I needed." what exactly do you mean? Try to be more speficic.
Mar 15 '11 #2
The issue I now have is that the fields that I am matching may contain characters that I need not only at the left but in the middle and end. I tried to use =right, and =mid expressions, but is there a better way to match one field to another by any part of the field? Let say I have :

Catwalk to match to Cat
And
Cat Walk to match to Cat
And
Ca twalk to match to Cat?

Makes sense?
Mar 16 '11 #3
NeoPa
32,556 Expert Mod 16PB
So, essentially you want SQL to recognise records as matched when the values in them don't really match, but a human might be able to recognise similarities between the two values. This sounds like the bones of a doctoral thesis.

I'm sorry. I think you're out of luck on this one. You could get somewhere with specific checks to handle specific situations, but nothing as general as you stipulate I'm afraid.
Mar 16 '11 #4
ADezii
8,834 Expert 8TB
@Yoni - I'm sure that I can come up with a strictly code-based solution, but I'm still a little hazy as to the exact nature of the request. Can you post some sample, accurate, data as well as what the results should be?
Mar 16 '11 #5
I suppose I was not clear. I would like to have a most closest match based on the data in fields. The match can contain 3-5 characters that can appear at the beginning or in the middle. I saw some posts on another site where people say it was possible by perhaps combining different expressions but I was not successful to implement them. Are there options to use WHERE vs Like in a query? Thanks
Mar 16 '11 #6
ADezii
8,834 Expert 8TB
Again, kindly post some sample Data along with expected Results, so that your request is crystal clear.
Mar 16 '11 #7
My apologies.
I attached both tables:

Table 1 (SVP_Account_Name)
Table2 (Account_Name_Incomplete)

I need to match Scode Description field from Table1 with Acct Name in Table2 so it can attach appropriate Ops SVP within the query. You will notice that Account Names in Table2 vary and may appear as exact or partial match based on one of the words in a name string. I guess may be "best possible match" would be a good solution for me?

Thank you in advance !
Attached Files
File Type: pdf TesterTable.pdf (8.0 KB, 479 views)
Mar 16 '11 #8
Rabbit
12,516 Expert Mod 8TB
Are you seriously trying to match up SCode to Acct Name? Why are these fields free text fields? You're going to have a hard time linking them up. We are talking Google level matching algorithms here.

I assume FacilitiesEast matches to East Facilities. But what does Facilities match to? I see two possibilities in there. And who knows what other weird matches you're going to need? I mean, I'm not even sure which one North America is supposed to match up to. It's probably Global North America but there's always the possibility the user meant NA Northeast Admin.

You won't be able to use a phoenetic algorithm. Your best bet is probably an n-gram technique, possibly a 2-gram at the word level. But if that's not sensitive enough then you'll need to go down to the character level.

Really, the solution is not a stopgap matching algorithm. You need to fix the data.
Mar 16 '11 #9
Hi. Agree, the data is very "fuzzy" since we inhereted this database. I am trying to at least get closest match between the two tables and "assign" Ops SVP and still show all non-matching records as well...
Mar 16 '11 #10
Rabbit
12,516 Expert Mod 8TB
What's the size of the dataset? If it's imperative to match all records, your only sure fire method is manually going through and matching it.

If you're just trying to match as many as possible, then there are lots of algorithms. You could use a simple one in which you check on a word by word basis. This wouldn't give many positive matches.

If you want as many positive matches as you can, I think your best bet is an n-gram algorithm. But beware that it's a fairly advanced algorithm and would require a lot of work to implement correctly. Unless, of course, you could find an existing library.
Mar 16 '11 #11
ADezii
8,834 Expert 8TB
@Yoni - What if the following Scenario comprised a Match (<==>):
  1. [Scode Description] and [Acct Name] are equal:
    1. Eliminations <==> Eliminations
    2. DBank <==> Dbank
    3. East Facilities <==> East Facilities
    4. Global North America <==> Global North America
  2. Greater than 50% of the Components in [Scode Description] are also contained in [Acct Name] in some manner:
    1. East Facilities <==> FacilitiesEast (both East and Facilites are contained in FacilitiesEast (100%), but Facilities would not be a Match since only Facilities in East Facilities in contained within (50%).
    2. Southfield Facilities ==> No Match in [Acct Name], since both Components are not contained within the [Acct Name].
    3. Global North America <==> North America (both North and America contained in [Acct Name] 66.6%.
P.S. - I do realize that this approach is rather simplistic, but the only other alternative as I see it is what Rabbit pointed out, more involved Algorithms. Matches could be written to a Results Table consisting of [Scode Description], [Acct Name], and [Ops SVP].
Mar 16 '11 #12
I think your second approach may work for my purpose. Would you by chance have a code that I can analyze and possibly use for this query?
Mar 16 '11 #13
ADezii
8,834 Expert 8TB
@Yoni - To whom are you referring to, Rabbit or ADezii, in your last Post?
Mar 16 '11 #14
Rabbit
12,516 Expert Mod 8TB
I think he's talking to you ADezii. He must have mistook your #2 as an alternate approach rather than a complementary approach.

Yoni, since the fields are free form, what do you intend to do about mispellings, typos, abbreviations, etc?
Mar 16 '11 #15
Thanks to all. I was replying to ADezii.
Mar 16 '11 #16
ADezii
8,834 Expert 8TB
@Yoni - I'll come up with something, but it will be by no means comprehensive. To be honest, I do not think that there is a comprehensive solution to your rather unique situation. Give me some time, and I'll see what I can come up with. The results will be displayed in an independent Table, and not in a Query per say, since I do not see an easy Method to incorporate the Logic into a Query. Stay tuned...
Mar 16 '11 #17
ADezii
8,834 Expert 8TB
@Yoni - Here is an Algorithm that I created based on what I previously stated in Post #12. I'll post the Code, which is well commented, as well as the Demo Database used for this Thread. Should you have any questions, please feel free to ask. You can make the Results more or less Restrictive by changing the Value of the conPERCENTAGE Constant, as well as the Relational Operator that proceeds it in Code Line #43.
Expand|Select|Wrap|Line Numbers
  1. Dim MyDB As DAO.Database
  2. Dim rst_1 As DAO.Recordset              'Table SVP_Account_Name
  3. Dim rst_2 As DAO.Recordset              'Table Account_Name_Incomplete
  4. Dim rstResults As DAO.Recordset         'Results Table
  5. Dim varRet                              'Variant Array to hold Elements of [Scode Description]
  6. Dim intCtr As Integer                   'Used to Loop thru Elements of [Scode Description]
  7. Dim intNumOfMatches As Integer          'Number of Matches (Elements in [Scode Description]
  8. Dim intOverallCtr As Integer            'Overall Counter (Number of Elements in [Scode Description])
  9. Const conPERCENTAGE As Single = 0.5     '> this Value on [Scode Description] Elements indicates Match
  10.  
  11. 'Clear the Results Table
  12. CurrentDb.Execute "DELETE * FROM tblResults", dbFailOnError
  13.  
  14. Set MyDB = CurrentDb()
  15. Set rst_1 = MyDB.OpenRecordset("SVP_Account_Name", dbOpenForwardOnly)
  16. Set rst_2 = MyDB.OpenRecordset("Account_Name_Incomplete", dbOpenSnapshot)
  17. Set rstResults = MyDB.OpenRecordset("tblResults", dbOpenDynaset, dbAppendOnly)
  18.  
  19. 'Match every Record ([Scode Description]] in SVP_Account_Name against every
  20. 'Record ([Acct Name])Record in Account_Name_Incomplete for any Partial Matches
  21. With rst_1
  22.   Do While Not .EOF
  23.     Do While Not rst_2.EOF
  24.       'Check for exact Matches first (Case-insensitive)
  25.       If ![Scode Description] = rst_2![Acct Name] Then
  26.         rstResults.AddNew
  27.           rstResults![scode] = ![Scode Description]
  28.           rstResults![Acct Name] = rst_2![Acct Name]
  29.           rstResults![Ops SVP] = ![Ops SVP]
  30.         rstResults.Update
  31.         'Debug.Print ![Scode Description], rst_2![Acct Name], ![Ops SVP]
  32.       Else      'No Match, parse [Scode Description], and see if > 50% of its
  33.                 'Elements are contained within [Acct Name]. In a prior Post, yoou
  34.                 'indicated that Spaces should be ignored, so let's account for that also
  35.         varRet = Split(![Scode Description], " ")
  36.           For intCtr = LBound(varRet) To UBound(varRet)
  37.             intOverallCtr = intOverallCtr + 1
  38.               If InStr(Replace(rst_2![Acct Name], " ", ""), varRet(intCtr)) > 0 Then
  39.                 intNumOfMatches = intNumOfMatches + 1
  40.               End If
  41.           Next
  42.             'Do > 50% of Elements in [Scode Description] match [Acct Name]
  43.             If (intNumOfMatches / intOverallCtr) > conPERCENTAGE Then    'Percentage Criteria for a 'MATCH'
  44.               rstResults.AddNew
  45.                 rstResults![scode] = ![Scode Description]
  46.                 rstResults![Acct Name] = rst_2![Acct Name]
  47.                 rstResults![Ops SVP] = ![Ops SVP]
  48.               rstResults.Update
  49.               'Debug.Print ![Scode Description], rst_2![Acct Name], ![Ops SVP]
  50.             End If
  51.             intNumOfMatches = 0: intOverallCtr = 0      'RESET, critical
  52.       End If
  53.         rst_2.MoveNext
  54.     Loop
  55.         rst_2.MoveFirst
  56.        .MoveNext
  57.   Loop
  58. End With
  59.  
  60. rst_1.Close
  61. rst_2.Close
  62. rstResults.Close
  63. Set rst_1 = Nothing
  64. Set rst_2 = Nothing
  65. Set rstResults = Nothing
  66.  
  67. 'Let's see the Results
  68. DoCmd.OpenTable "tblResults", acViewNormal, acReadOnly
  69. DoCmd.Maximize
Results Table (tblResults) from Data submitted in Post #9 via Attachment, plus one Test Record indicating Space elimination:
Expand|Select|Wrap|Line Numbers
  1. SCode                    Acct Name                 Ops SVP
  2. DBank                    Dbank                     Teri Muller
  3. East Facilities          East Facilities           Kathy Wu
  4. East Facilities          FacilitiesEast            Kathy Wu
  5. Eliminations             Eliminations              Jorge Carr
  6. Global North America     Global North America      Betty Dunn
  7. Global North America     North America             Betty Dunn
  8. Help Me Rhonda           Rh onda help              ADezii
  9.  
Attached Files
File Type: zip Find Matches.zip (19.1 KB, 183 views)
Mar 16 '11 #18
Rabbit
12,516 Expert Mod 8TB
Just a suggestion but you could generalize the function to the form:
Expand|Select|Wrap|Line Numbers
  1. Function matchPercentage(str1 As String, str2 As String) As Double
  2.    ...
  3.    matchPercentage = numMatches / numTerms
  4. End Function
Then you can call it from a SQL query.
Mar 16 '11 #19
ADezii
8,834 Expert 8TB
Thanks Rabbit for the suggestion. Actually, the Code is too long to be integrated into a Single Procedure anyway. I intentionally kept it in-line for clarification reasons, but in hindsight, it should probably have been splintered into two or more routines. I know that the Code is a little rogue, OK a lot rogue, but what is your honest opinion on this particular approach?
Mar 17 '11 #20
Rabbit
12,516 Expert Mod 8TB
It's a good approach. In essence, it is a simplified and specialized 1-gram technique at the word level. However, it is susceptible to spelling errors.

To take it further, what you could do is do a 2-gram at the letter level. For example, the 2-grams for the phrase "hi bob" is "$h", "hi", "i ", " b", "bo", "ob", "b#" where $ is the start of the string and # is the end of the string. Once you have all the distinct 2-grams, you do a frequency count. Then, when you want to see if another string is a match, you get the frequency count of its 2-grams and count how many 2-grams in one string match the 2-grams in the other string. This allows you to fudge the spelling a little bit depending on how high you want to set the threshold. The beauty of the n-gram method is that it can be indexed even though it can create giant indexes.

Also, I don't think it would be too difficult to port your function. Something like this could work. The code is off the top of my head so it may not work with a simple copy and paste.

Expand|Select|Wrap|Line Numbers
  1. Function FuzzyMatch(arrWords() As String, strToMatch As String) As Long
  2.    Dim intMatches As Integer
  3.    Dim i As Integer
  4.  
  5.    intMatches = 0
  6.    For i = LBound(arrWords) To UBound(arrWords)
  7.       If InStr(strToMatch, arrWords(i)) > 0 Then
  8.          intMatches = intMatches + 1
  9.       End If
  10.    Next
  11.  
  12.    FuzzyMatch = intMatches / Len(arrWords)
  13. End Function
Then you could call it in SQL like so
Expand|Select|Wrap|Line Numbers
  1. SELECT *
  2. FROM Table1
  3. WHERE FuzzyMatch(Split(Field1, " "), Field2) > 0.5
Mar 17 '11 #21
TheSmileyCoder
2,322 Expert Mod 2GB
The approach and method discussion is interesting in itself, but just out of curiosity, are you (Yoni) expecting to do this just this once time and clean up your data?

How many records are we talking about in total?
Mar 17 '11 #22

Sign in to post your reply or Sign up for a free account.

Similar topics

8
by: Synonymous | last post by:
Hello, Can regular expressions compare file names to one another. It seems RE can only compare with input i give it, while I want it to compare amongst itself and give me matches if the first x...
0
by: Dan Jones | last post by:
I have a database of books that was originally created as a flat file. Each record has a number of fields, including the authors name. I'm trying to convert the database to something a little more...
0
by: Dejan M. | last post by:
Please can you help me with this problem: In the next form, when I Add new person who is not registered in the base I had the massage: The Microsoft Jet database engine cannot find a record in...
0
by: .Net Sports | last post by:
I've been looking everywhere, but i can't locate any functions, or articles on the net or my resources, on how to open a file using vb.net, and then grabbing certain parts of the file, and putting...
9
by: Henrootje | last post by:
Hello all, I have a problem that is alike others I have seen but I am not able to apply the solutions I saw using Trim, Left, Right and so on. That is why I ask here. What is the case? I...
1
by: Dawn | last post by:
Can I have a criteria where, when I add new records only, that data must be entered in field A before data is entered in Field B? So if a user goes to field B and starts entering data before data...
7
by: Petepinca | last post by:
I have 2 Access tables. One has a field with a 4 character product id. Example data: "3210". The other table has the same product id's for matching records BUT has leading characters: "RD3210", etc....
5
by: Dean | last post by:
Hi, I have a table with non-unique identifiers. I need to take all the values with the same ID's and combine them into one field with a semicolon as a seperator. These values may exceed 255...
2
by: nofear | last post by:
I have 2 tables with 1 field each the filed of the fist table has a 3 digit number stored as text and the field of the second table has a 6 digit number stored as text I'd like to select each 6...
8
by: danon | last post by:
Hi evrey one needing you're hlp please can any one tel me how to programme a first characters uppercase in textbox thank you
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
1
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
1
by: Shællîpôpï 09 | last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.