By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
454,298 Members | 1,691 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 454,298 IT Pros & Developers. It's quick & easy.


P: 675
I want to search for names, but eliminate the small words, such as "The", "An", "Or", etc., and search for alternate spellings such as "Color" and "Colour". These words are located in tables.

I can search for a list of single characters by using square brackets [...]. Can I somehow incorporate the IN list?
LIKE "?[A,E,U]*[S,T]" returns all where the second letter is an "A", "E", or "U" and the last letter is either an "S" or a "T". Is there a way to substitute [Color, Colour] for the final [S,T]?
Aug 10 '09 #1
Share this Question
Share on Google+
19 Replies

Expert 100+
P: 1,287
I don't see any way to do this other than OR together the LIKE conditions, since the [ ] operators are for single characters only.
As a shortcut you could do:
LIKE "?[AEU]*Colo*r"
To be strict you'd have to do:
LIKE "?[AEU]*Color" OR LIKE "?[AEU]*Colour"
Aug 10 '09 #2

P: 675
The word "Color" was for illustration only. There are too many alternate spellings between British and American English to deal with simply. Then there is the Three/3/III alternates, and the And/& to mention a few. In movie titles, is it "Karate Kid 3", "The Karate Kid III", or ... Bird names have the American/British pairs and common misspellings when the entry is done by non-English speaking people.
When bird names were standardized worldwide for English, the words "Common", "Northern", "European", or "Eurasian" were added as a first word to many species where only one of the type (Starling, Teal, Harrier) exists in England but where these names are used with qualifiers elsewhere. USA has Blue-winged, Green-winged, and Cinnamon Teal. When Europeans post something and want to label it "Teal", they know that they need the qualifier, so they pick one. They don't use them, so they can't remember which is correct.
I've considered a Soundex feature for misspellings and simple spelling variations. After all, who can spell the color Vermillion correctly anyway? This would not handle Wigeon/Widgeon, different spellings for the same bird, so that is out.
Aug 10 '09 #3

Expert 100+
P: 1,287
I'm not sure what your original goal is. Would you keep a table of every word and it's alternate spellings?
Aug 10 '09 #4

P: 675
Yes, I would have a table tPairs
Expand|Select|Wrap|Line Numbers
  1. Correct   Alternate
  2. 3         III
  3. 3         Three
  4. And       &
  5. Color     Colour
  6. Vermilion Vermillion
  7. Wigeon    Widgeon
  8. Common    Eurasian
  9. Common    Northern
  10. Common    European
  11. ...
To search for "Vermillion Flycatcher", I would parse the entry, probably with "Split()" and then build in VBA the WHERE clause for the query.
Seemed like a good idea if it were supported in SQL. The square-bracket feature of LIKE allows a If it isn't, it probably isn't worth any more time thinking about it.
I have Googled this to death, and when I found that I could put an "!" within the [...] to mean "NOT", I began to wonder what else I could do with these []'s. Therefore, the question asked.
Aug 10 '09 #5

Expert 100+
P: 1,287
I have good news. There is a way to do this. You'll just have to play with this to see. You could make the funciton take a string to search and a string to match and return a boolean. Here's a quick reference to regular expresions so you can try some out. Note that this uses late binding, and if you used it extensively you might not want to.
Expand|Select|Wrap|Line Numbers
  1. Public Function MatchRegEx()
  3.     Dim RE As Object
  4.     Dim allMatches As Object
  5.     Dim strPattern As String
  6.     Dim strFindIn As String
  8.     strPattern = ".*(color|colour).*"
  9.     strFindIn = "the color red"
  11.     Set RE = CreateObject("vbscript.regexp")
  12.     RE.Pattern = strPattern
  13.     RE.IgnoreCase = True
  14.     RE.Global = True
  15.     Set allMatches = RE.Execute(strFindIn)
  16.     If allMatches.count > 0 Then
  17.         MsgBox "match"
  18.     Else
  19.         MsgBox "no match"
  20.     End If
  22. End Function
"the color red" matches
"the colour red" matches
"the coloor red' doesn't
Aug 10 '09 #6

P: 675
Thanks ChipR. I appreciate the time & energy you went thru.

This is way beyond my understanding of VBA. I don't understand the examples, much less the text in the link you supply. Although I could make your example work, I don't see how to expand it to a obj.RecordSource=SELECT X, Y, Z FROM tMyTable WHERE ???WhatGoesHere

I'm going to give this up. Thank you anyway for your time.
Aug 10 '09 #7

Expert 100+
P: 1,287
I just want to say that if you have the time, read a little bit about regular expressions. I'm sure anyone who uses them will agree that regular expressions are very much worth the small amount of effort required to become familiar with them.
This is just one example of something that was otherwise nearly impossible made easy with the regex object. You'll find more and more applications now let you use regular expressions to search and/or replace text.
Aug 11 '09 #8

P: 675
I Googled and read several articles. I can see what the descriptions say it is doing, but the examples were less than clear.
In your posted example, I see what you are doing, but really don't understand the pattern. But for now, OK, I go with it.
If strFindIn = "Common Clay-coloured Sparrow" , I want to know if it a valid entry in my bird species table, containing 10,000 rows. Here nothing is mis-spelled for simplicity, but "Common" might be "Northern" and "coloured" = "colored". The next search is "American Widgeon", or "European Bee Eater". Although I know that "Bee Eater" should be "Bee-eater" but is often "Beeeater", how can I build strPattern without already knowing what is there. To have a standard pattern would greatly exceed 4,000 characters.
Regular expressions, or regexes for short, are a way to match text with patterns.
But I don't have patterns. I have multiple spellings and a few common substitute words.
Names in a natural language, be they birds, movies, companies, or places do not make nice patterns.
Parsing dates - This really doesn't parse the date. It verifies that the date matches a given pattern. So you need to check the string against many patterns. Once you know the pattern (dates could match more than one pattern), then you have to actually parse it. You determine the pattern is "dd mmmyy" and now need code to parse, then verify, that this is a valid date. "77 Jan12" isn't valid, but would match the pattern. Now try "yy mmmdd"? that's valid, but probably wrong.
I appreciate your introducing me to regex, and your time to follow up with this subject. I just don't think its applicable for this.
Aug 11 '09 #9

Expert 100+
P: 1,287
I don't really see this as necessary for your project, but I've been experimenting since I've just discovered it. Don't feel pressured to use this stuff. I tested it out a bit, and it works pretty well for exactly what you asked for in the original post, but I'm not sure of the speed comparison with generating LIKE statements strung together by OR.

The regex doesn't remove the need for the tPairs table, it just allows you to search for the pairs all at once. To answer your previous question, remember that you can call a function in a select query. I used something like this:

MatchesRegEx(Forms!Whatever!txtPattern, [itemName]);

txtPattern is what you create on the fly based on your input. For example, with input "Common Clay-coloured Sparrow", you split out the words and find that coloured has alternate spellings in your tPairs. I think that was your intention at the start if you were going to have the table of alternate spellings. Your code assembles the pattern

txtPattern = "Common Clay-(colored|coloured) Sparrow"

I'm not saying the function to convert the input into the pattern using tPairs is trivial, but it's definitely doable.

Then the function would return a boolean, and your query results in only matches.
Expand|Select|Wrap|Line Numbers
  1. Public Function MatchesRegEx(strPattern As String, strFindIn As String) As Boolean
  2.     Dim i As Long
  3.     Dim RE As Object, allMatches As Object, aMatch As Object
  4.     Set RE = CreateObject("vbscript.regexp")
  5.     RE.Pattern = strPattern
  6.     RE.IgnoreCase = True
  7.     Set allMatches = RE.Execute(strFindIn)
  8.     If allMatches.count > 0 Then
  9.         MatchesRegEx = True
  10.     Else
  11.         MatchesRegEx = False
  12.     End If
  13. End Function
I'm not sure where the parsing of dates came from. Access has built in functions that do pattern matching on dates for you :)
Aug 11 '09 #10

P: 675
txtPattern = "Common Clay-(colored|coloured) Sparrow"
Is nice, but if I can build this, I already know that the word 'coloured' exists. therefore, building the string fragment "(colored|coloured)" will return true, always. This being known before calling any function, the call is not necessary. If not called, then I don't need to build the fragment. Or am I missing your point.
User functions in SQL have been very slow, in my experience. SELECT Len([Name]) AS NameLength takes significantly less time than SELECT MyFunction([Name]) AS NameLength, no matter what code is in MyFunction()
Currently, I am using DCount ("Key","tSpecies","Common Clay-coloured Sparrow"). If 1, I have an exact match. If >1, then probably a partial name like "Teal" where name is valid but incomplete and refers to many species. If 0, no match, and user must modify search string based on his knowledge, or external research. Various "Tools" become enabled depending on this result. >1 would enable a combobox with all names found, etc.
Aug 11 '09 #11

P: 675
What I was looking for was something such as "DCount ("Key","tSpecies","Name Like 'Gr[ae]y Jay' ")
This would return a 1 for the table field "Gray Jay", even if "Grey" were submitted.
Aug 11 '09 #12

Expert 100+
P: 1,287
Exactly. You knew the word 'colored' was in your data, but the user didn't. And you thought, let me put the pair colored-coloured in my tPairs in case they spell it wrong, then they'll still get a match. That was my point of using that input to build the pattern that would match both options.

I like the DCount idea, and I think you could accomplish the same goal by looping through alternate spellings and just adding to the total matches. Of course, it's all dependant on whether you want to create the tPairs table and put some common alternate spellings in it. I think it's a pretty cool idea you had.
Aug 11 '09 #13

P: 675
Exactly. You knew the word 'colored' was in your data, but the user didn't. And you thought, let me put the pair colored-coloured in my tPairs in case they spell it wrong, then they'll still get a match. That was my point of using that input to build the pattern that would match both options.
I am quite confused. This user types "...coloured...". To see if this is an alternate spelling, I do a strName = DLookup ("RealName", "tAltNames", "AltName="coloured") and get strName = "colored".
What do I do with strName now with regex ???
Seems more like I should do (with proper declares):
Expand|Select|Wrap|Line Numbers
  1. strInName = Replace(strInName,"-", " - ")
  2. varArray = Split(strInName)
  3. For I = 0 to UBound(varArray)
  4. strTemp = DLookup (.....)  This is like above DLookup, but with string concatinated into 'criteria'
  5. If Not IsNull(strTemp) Then varArray(i) = strTemp
  6. Next I
  7. strTemp = Join(varArray)
  8. strTemp = Replace(strTemp," - ", "-")
Aug 11 '09 #14

Expert 100+
P: 1,287
Here's what I'm thinking. The reason you can't just replace "coloured" in the search string with "colored" is because it's not always wrong. For example, if the user enters "Karate Kid 3" you may want to search for both "3" and "III."
This is where you use the values to generate the pattern, and just search for both.
Expand|Select|Wrap|Line Numbers
  1. strPattern = ""
  2. For i = 0 to UBound(varArray)
  3.     strTemp = DLookup(...)
  4.     If IsNull(strTemp) Then
  5.         strPattern = strPattern & varArray(i) & " "
  6.     Else
  7.         strPattern = strPattern & "(" & varArray(i) & "|" & strTemp & ") "
  8.     End If
  9. Next i
  10. 'trim off the last blank space from the pattern,
  11. 'may add ".*" to the beginning or end to allow any characters
I was actually picturing multiple alternate spellings, so possibly going through a recordset rather than just a DLookup, but that's just adding another loop.
Aug 12 '09 #15

P: 675
Thanks again ChipR. But I still don't get it. In post #6 you introduced 'regex' and from that point on I have not really seen how your suggestions apply to my problems.

'Regex' identifies patterns. It will tell me if my string matches a pattern or template. Although much more extensive than 'Like', it is still pattern matching. My problem is that if I know the pattern, I know the answer and do not need to use patterns. In my data, 'color' is the only spelling, and 'colour' will not find a match, so all input of 'colour' needs to be searched for as 'color'. For 'Karate Kid III', or K..K..3 either could be correct. So the table would be
Expand|Select|Wrap|Line Numbers
  1. Correct Alternate
  2. Color   Colour
  3. 3       III
  4. III     3
If I could have an SQL WHERE Name Like 'Karate Kid [3|III]' just as I have WHERE Name Like 'Gr[ae]y Jay', this would solve my problem. I could write a generic pre-search function to parse the string (to 'Grey' & 'Jay' or to 'Karate' & 'Kid' & 'III'), loop thru the words and see if any had alternates, and replace with pairings 'III|3|Three' or 'color|colour'. This looping will work for color/colour, but not for 3/III/Three' or 'Common/European/Eurasian/Northern'
Aug 12 '09 #16

Expert 100+
P: 1,287
If I could have an SQL WHERE Name Like 'Karate Kid [3|III]' just as I have WHERE Name Like 'Gr[ae]y Jay', this would solve my problem.
I don't see the difference between what you are asking for and the SELECT statement in Post #10 that does exactly this. I'm not sure what you mean when you say looping doesn't work for 3/III/Three, it's just a different bit of code to assemble a string that will match each part.
Aug 12 '09 #17

P: 675
From post #11
User functions in SQL have been very slow, in my experience. SELECT Len([Name]) AS NameLength takes significantly less time than SELECT MyFunction([Name]) AS NameLength, no matter what code is in MyFunction()
responds to this.

Using DLookup or DCount is already slow enough. To put a user function into DLookup cannot speed it up. Normally, when processing user input I don't worry about speed, as most processing, i.e.'Sub textbox_OnChange', because the typing cannot keep up with the computer. Here the response is already slow, even when pasting the entire name into a textbox, and not typing each letter and processing each.
Aug 12 '09 #18

Expert 100+
P: 1,287
I think you're right, the use of the function is going to be slower than the SQL up to a certain point when you have a lot of alternates or substitutions. It would be the speed of
WHERE matchesRegEx("Karate Kid (3|Three|III)", [name])
WHERE [name] = 'Karate Kid 3' OR [name] = 'Karate Kid Three' OR [name] = 'Karate Kid III'
I might do some testing eventually.
Aug 12 '09 #19

P: 675
For the moment, I'm going to put this on hold. Because with the birds I have Dates and Locations to deal with, as well as names, this is very involved. I will test various combinations for speed vs benefit, but am afraid that the human brain and some manual clicking can simplify the coding 90%.

For the movies, the queries are quite complex. I have a 'filter' dialog that may allow AND or OR for any or all of the following categories: Title(including multible), ReleaseYear or range, Seen(and Date), Rating(MPAA)(including multible), Genre(including multible), Act[or|ess](including multible) , Director(including multible), Description, Language, and so forth. I build this query in VBA and it may exceed 4K characters.
Here I might want all movies either starring Clint Eastwood, Directed by Clint Eastwood (but not his daughter) or Francis Ford Coppela or Stephen Spielberg (that name is mis-spelled, by the way), released since 2000, that I have not seen. Or all "R" or "PG-13" rated movies in the db not seen and in Drama or Romance.

Let's close this thread, and when I'm ready, I'll open a new thread "Using RegEx", coming soon to a forum near you.
Aug 12 '09 #20

Post your reply

Sign in to post your reply or Sign up for a free account.