By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
459,576 Members | 1,360 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 459,576 IT Pros & Developers. It's quick & easy.

How do I Match Partial Keywords from "Table_GEO" with "Table_Main"?

P: 37
I am using Access 2007/2010. I have a large list of data (I will call it "Main), in which phrases contain a city or state, plus a keyword. I then have a table called "GEO" which will contain a static list of cities and states for matching purposes. My goal is to eliminate the "keyword" in the string in "Main", but leave the city/state behind. Examples:
  • New Jersey Personal Injury
  • Pennsylvania Medical Malpractice
  • Med Mal in New Jersey
  • PI attorney in Kentucky
I want to convert the above list to the following:
  • New Jersey
  • Pennsylvania
  • New Jersey
  • Kentucky
Again, I have a list of all of the cities/states on hand in Table GEO, for the keyword recognition to take place. Any help would be appreciated. Thank you.
Jul 17 '10 #1

✓ answered by nico5038

OK, to start with you need to use the:
like "*" & tblCity.CityState & "*"
to make sure all hits are found. This solves all three combinations you were experimenting with.

Best to solve this in stages:
1) Extract the matches and the length of the CityState field
2) Use the previous result to select the max length
3) Join queries from stage 1 and stage 2 by the Description and the (Max) length

Stage 1 give a query like:
Expand|Select|Wrap|Line Numbers
  1. SELECT tblMain.ID, tblMain.Description, Nz([CityState],"<Not found>") AS [Found], len(tblCity.CityState) AS CityStateLen
  2. FROM tblMain LEFT JOIN tblCity ON tblMain.Description like "*" & tblCity.CityState & "*"
  3.  
Stage 2 looks like:
Expand|Select|Wrap|Line Numbers
  1. SELECT ID, Description, Max(CityStateLen)
  2. FROM qryStage1
  3. GROUP BY tblMain.ID, tblMain.Description
  4.  
Stage 3 looks like:
Expand|Select|Wrap|Line Numbers
  1. SELECT qryStage1.Description, qryStage1.Found, qryStage1.CityStateLen
  2. FROM qryStage1 INNER JOIN qryStage2
  3. ON (qryStage1.CityStateLen = qryStage2.MaxOfCityStateLen) AND (qryStage1.Description = qryStage2.Description);
  4.  
Now missing matches will be dropped, so it's advisable to inspect those first. My experience with text fields is that not every user is entering City names as accurate as you would wish....

Nic;o)

Share this Question
Share on Google+
17 Replies


nico5038
Expert 2.5K+
P: 3,072
A special JOIN can solve this for you.

Just create a query joining rows when the Description is LIKE the city/state from your table. This would look like:

Expand|Select|Wrap|Line Numbers
  1. SELECT tblMain.Description, Nz([CityState],"<Not found>") AS [Found]
  2. FROM tblMain LEFT JOIN tblCity ON tblMain.Description like "*" & tblCity.CityState & "*";
  3.  
Guess the fieldnames need to be changed, but I guess you get the idea.

When the query is working, change it into a MakeTable query to get the final result.

Basically
Jul 18 '10 #2

P: 37
I'm not sure if a query would work here. I should have specified, there is not a one to one match. For example, I will have over 100,000 keywords to match against 5000+ or so cities in the US. This is why I am not sure if a join will work since there is really no way to join the two data sets one to one. I am thinking that the only way I can accomplish what I want is by using DAO, but I am not so sure? Maybe I am wrong? Do the data sets have to have identical rows to use a join query?
Jul 18 '10 #3

nico5038
Expert 2.5K+
P: 3,072
It will work as you need, number of rows don't matter.
Just check the way I constructed the JOIN. It's no 1:1, but a conditional one.

Mismatches will show because of the NZ() function like "<Not found>"

The only problem will be an overlap of city/state names like "New York" and "York", these will trigger the JOIN twice and result into two hits.
Having a unique ID in the "Main" table will enable you to solve this with a group by query.

Just try.

Nic;o)
Jul 18 '10 #4

P: 37
Ok you were right, this does work very nicely. But like you said, there is an overlapping problem. Since I am using state abbreviations, phrases such as "personal injury in New Jersey" could spit out "AL". I'm not really understanding how a unique identifier could help in this case? You said

The only problem will be an overlap of city/state names like "New York" and "York", these will trigger the JOIN twice and result into two hits.
Sure, this could help to resolve additional queries, however it won't help with accuracy. For example, how would the query know to choose New Jersey vs AL? What if it chooses AL first? Then I am out of luck.

If this does help, the way I was able to get this to work in excel is as follows:

Expand|Select|Wrap|Line Numbers
  1. =IF(ISNA(LOOKUP(9999,SEARCH(" "&City&" "," "&G2&" "),City)),"Statewide",LOOKUP(9999,SEARCH(" "&City&" "," "&G2&" "),City))
"City" represents the array of cities, and G2 is the target keyword string. To account for exact matching, I used the following queries

" "&City&" "
" "&G2&" "

I'm not sure how to translate this into Access?
Jul 18 '10 #5

P: 37
Ok, after doing some more research, I came up with the following possibilities:

like "* " & geo.CityState
like geo.CityState & " *"
like "* " & geoCityState & " *"

I can "sorta" get all functionality working now, but there are still problems with queries like "york, and new york". Anyway, I don't know how to combine the above queries into one query. I tried this:

Expand|Select|Wrap|Line Numbers
  1.  ...like "* " & geo.CityState or "* " & geoCityState & " *" or geo.CityState & " *";
But when I use the "or" condition, I continue to get errors. So technically, I can create three queries and merge all 3 results into one, but there must be some way to combine all statements into one...
Jul 18 '10 #6

nico5038
Expert 2.5K+
P: 3,072
OK, to start with you need to use the:
like "*" & tblCity.CityState & "*"
to make sure all hits are found. This solves all three combinations you were experimenting with.

Best to solve this in stages:
1) Extract the matches and the length of the CityState field
2) Use the previous result to select the max length
3) Join queries from stage 1 and stage 2 by the Description and the (Max) length

Stage 1 give a query like:
Expand|Select|Wrap|Line Numbers
  1. SELECT tblMain.ID, tblMain.Description, Nz([CityState],"<Not found>") AS [Found], len(tblCity.CityState) AS CityStateLen
  2. FROM tblMain LEFT JOIN tblCity ON tblMain.Description like "*" & tblCity.CityState & "*"
  3.  
Stage 2 looks like:
Expand|Select|Wrap|Line Numbers
  1. SELECT ID, Description, Max(CityStateLen)
  2. FROM qryStage1
  3. GROUP BY tblMain.ID, tblMain.Description
  4.  
Stage 3 looks like:
Expand|Select|Wrap|Line Numbers
  1. SELECT qryStage1.Description, qryStage1.Found, qryStage1.CityStateLen
  2. FROM qryStage1 INNER JOIN qryStage2
  3. ON (qryStage1.CityStateLen = qryStage2.MaxOfCityStateLen) AND (qryStage1.Description = qryStage2.Description);
  4.  
Now missing matches will be dropped, so it's advisable to inspect those first. My experience with text fields is that not every user is entering City names as accurate as you would wish....

Nic;o)
Jul 18 '10 #7

P: 37
Everything above works great, thanks so much. There is just one tiny problem now. Very short strings, such as state abbreviations still pose a problem with this code:

like "*" & tblCity.CityState & "*"

Example: "personal injury" spits out "al" for alabama. I think the way to eliminate this problem is by including spaces in the matching queries like I had above. Thoughts?
Jul 18 '10 #8

nico5038
Expert 2.5K+
P: 3,072
Guess you stumbled upon the end of automating the solution of your problem.
In situations like this I revert to the manual check and update of the source text to make it fit. There won't be a 100% proof way to detect all automatically. That's also why I mentioned typo's in City Names. Only human eyes (and intelligence) can solve all.

Going further technically will probably cost more time than manually checking the odd matches (or mismatches)

Nic;o)
Jul 18 '10 #9

P: 37
Actually I figured out the problem, I needed this code to give me what I need:

Expand|Select|Wrap|Line Numbers
  1. like "* " & geo.State or  main.keywords like "* " & geo.State & " *" or main.keywords like geo.State & " *";
Now missing matches will be dropped, so it's advisable to inspect those first.
Actually, is there a way to keep missing matches in this data set? A missing term in this case indicates something important.
Jul 19 '10 #10

nico5038
Expert 2.5K+
P: 3,072
Yes, by forcing a value for the length when no city is found.

Nic;o)
Jul 19 '10 #11

P: 37
That's what I thought, but when I tried it, the numbers got all screwy. Here is what I tried, for the first query:

Expand|Select|Wrap|Line Numbers
  1. ...nz(geo.State,"national") AS [found], nz(len(geo.State),1) AS StateLen
So 1's are being placed where blanks are, however when I run query 2, the max lengths are screwed up. The problem is with the nz function, because as soon as I took out the nz function, the max lengths were fine again for query 2.
Jul 19 '10 #12

P: 37
Since this problem is very important for me to fix, I attached my testing database file. All you have to do is run UpdateState1, then UpdateState2 (well, of course you will know this :)). You will see how UpdateState2 does not match UpdateState1 in State Length when I force values into the null fields using the nz function in UpdateState1. But when I erase the nz function in UpdateState1, values are perfectly normal. I'm not sure if this is an Access bug, or maybe I am just not seeing something here that the nz function is doing?
Attached Files
File Type: zip testing.zip (26.7 KB, 59 views)
Jul 19 '10 #13

nico5038
Expert 2.5K+
P: 3,072
You need to use an IIF inquery UpdateState1 like:

Expand|Select|Wrap|Line Numbers
  1. SELECT zoomrank.ID, zoomrank.keywords, nz(geo.State,"national") AS [found], IIF(isnull(geo.State),999,len(geo.State)) AS StateLen
  2. FROM zoomrank LEFT JOIN geo ON zoomrank.keywords like "* " & geo.State or  zoomrank.keywords like "* " & geo.State & " *" or zoomrank.keywords like geo.State & " *";
  3.  
Nic;o)
Jul 19 '10 #14

NeoPa
Expert Mod 15k+
P: 31,769
You may get away with adding the surrounding spaces manually and doing just the single match :
Expand|Select|Wrap|Line Numbers
  1. SELECT zoomrank.ID
  2.      , zoomrank.keywords
  3.      , Nz(geo.State,'National') AS [Found]
  4.      , Len(Nz(geo.State, 'X')) AS StateLen
  5.  
  6. FROM   zoomrank LEFT JOIN
  7.        geo
  8.   ON   ' ' & zoomrank.keywords & ' ' Like '* ' & geo.State & ' *'
(I think.)
Jul 19 '10 #15

P: 37
Thank you two for the alternate solutions. And I want to especially thank Nico for his kind assistance in helping me with this project.
Jul 19 '10 #16

nico5038
Expert 2.5K+
P: 3,072
Glad we could help, success with your application !

Nic;o)
Jul 19 '10 #17

NeoPa
Expert Mod 15k+
P: 31,769
Yes. Definitely Nico. I really just played around with his already produced answer.
Jul 19 '10 #18

Post your reply

Sign in to post your reply or Sign up for a free account.