473,395 Members | 1,656 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,395 software developers and data experts.

Speeding up a query

Hello again...

This expression is working, but is very slow. I suspect it's because the SQL is peforming the DamerauLevenshtein function twice for each record, and the function is pretty intense to begin with.

I'm wondering if incorporating the MIN() from the SQL right into the DamerauLevenshtein function might speed things up, possibly using VBA syntax instead of SQL. As in, identify the best match right in the function rather than retrieving a list of possible matches and then running the same function again to determine the MIN and pick the right one using a JOIN. It would mean passing the maximum threshhold for error (in this case, "<3") to the function instead of declaring it in the SQL, then using that as a basis to create an array of results and pick the best one, then send it back to the SQL.

I hope that makes sense :/

Anyway here's the SQL:
Expand|Select|Wrap|Line Numbers
  1. SELECT t1.*
  2. FROM (
  3.  
  4. SELECT TCSDBOWNER_EMP.ID AS EMP_ID, TCSDBOWNER_EMP.LAST_NAME AS EMP_LNAME, TCSDBOWNER_EMP.FIRST_NAME AS EMP_FNAME, q_insCrosstab.[SK Insurance], TCSDBOWNER_EMP.EMAIL_ADR AS EMP_EMAIL, TCSDBOWNER_EMP.ACTIVE_FLAG AS ACTIVE, SK_Ins.[Full_Name], DamerauLevenshtein(q_insCrosstab.[NAME],SK_Ins.[Full_Name]) AS DamLev FROM SK_Ins, TCSDBOWNER_EMP,q_insCrosstab WHERE DamerauLevenshtein(q_insCrosstab.[NAME],SK_Ins.[Full_Name]) <3 AND  TCSDBOWNER_EMP.ACTIVE_FLAG="T" ORDER BY TCSDBOWNER_EMP.LAST_NAME, TCSDBOWNER_EMP.FIRST_NAME
  5.  
  6. )  AS t1 
  7.  
  8. RIGHT JOIN (
  9.  
  10. SELECT EMP_ID, MIN(DamLev) AS MinDamLev FROM (
  11.  
  12. SELECT TCSDBOWNER_EMP.ID AS EMP_ID, TCSDBOWNER_EMP.LAST_NAME AS EMP_LNAME, TCSDBOWNER_EMP.FIRST_NAME AS EMP_FNAME, q_insCrosstab.[SK Insurance], TCSDBOWNER_EMP.EMAIL_ADR AS EMP_EMAIL, TCSDBOWNER_EMP.ACTIVE_FLAG AS ACTIVE, SK_Ins.[Full_Name], DamerauLevenshtein(q_insCrosstab.[NAME],SK_Ins.[Full_Name]) AS DamLev FROM SK_Ins, TCSDBOWNER_EMP,q_insCrosstab WHERE DamerauLevenshtein(q_insCrosstab.[NAME],SK_Ins.[Full_Name]) <3 AND  TCSDBOWNER_EMP.ACTIVE_FLAG="T" ORDER BY TCSDBOWNER_EMP.LAST_NAME, TCSDBOWNER_EMP.FIRST_NAME
  13.  
  14. )  AS x 
  15.  
  16. GROUP BY EMP_ID
  17.  
  18. )  AS t2 ON (t1.EMP_ID = t2.EMP_ID) AND (t1.DamLev = t2.MinDamLev);
and the VBA:
Expand|Select|Wrap|Line Numbers
  1. Function DamerauLevenshtein(str1 As String, str2 As String, Optional intSize As Integer = 256)
  2.  
  3.      Dim intTotalLen As Integer, arrDistance, intLen1 As Integer, intLen2 As Integer, i As Integer, j As Integer, arrStr1, arrStr2, arrDA, intMini As Integer
  4.      Dim intDB As Integer, intI1 As Integer, intJ1 As Integer, intD As Integer
  5.  
  6.     'adding basAlphNum() to strip spaces and non-alpha chars
  7.      str1 = basAlphNum(str1)
  8.      str2 = basAlphNum(str2)
  9.     'original code for DamerauLevenshtein() follows
  10.  
  11.      str1 = UCase(str1)
  12.      str2 = UCase(str2)
  13.      intLen1 = Len(str1)
  14.      intLen2 = Len(str2)
  15.      intTotalLen = intLen1 + intLen2
  16.      ReDim arrStr1(intLen1)
  17.      ReDim arrStr2(intLen2)
  18.      ReDim arrDA(intSize)
  19.      ReDim arrDistance(intLen1 + 2, intLen2 + 2)
  20.      arrDistance(0, 0) = intTotalLen
  21.  
  22.      For i = 0 To intSize - 1
  23.          arrDA(i) = 0
  24.      Next
  25.  
  26.      For i = 0 To intLen1
  27.          arrDistance(i + 1, 1) = i
  28.          arrDistance(i + 1, 0) = intTotalLen
  29.      Next
  30.  
  31.      For i = 1 To intLen1
  32.          arrStr1(i - 1) = Asc(Mid(str1, i, 1))
  33.      Next
  34.  
  35.      For j = 0 To intLen2
  36.          arrDistance(1, j + 1) = j
  37.          arrDistance(0, j + 1) = intTotalLen
  38.      Next
  39.  
  40.      For j = 1 To intLen2
  41.          arrStr2(j - 1) = Asc(Mid(str2, j, 1))
  42.      Next
  43.  
  44.      For i = 1 To intLen1
  45.          intDB = 0
  46.  
  47.          For j = 1 To intLen2
  48.              intI1 = arrDA(arrStr2(j - 1))
  49.              intJ1 = intDB
  50.  
  51.              If arrStr1(i - 1) = arrStr2(j - 1) Then
  52.                  intD = 0
  53.              Else
  54.                  intD = 1
  55.              End If
  56.  
  57.              If intD = 0 Then intDB = j
  58.  
  59.              intMini = arrDistance(i, j) + intD
  60.              If intMini > arrDistance(i + 1, j) + 1 Then intMini = arrDistance(i + 1, j) + 1
  61.              If intMini > arrDistance(i, j + 1) + 1 Then intMini = arrDistance(i, j + 1) + 1
  62.              If intMini > arrDistance(intI1, intJ1) + i - intI1 + j - intJ1 - 1 Then intMini = arrDistance(intI1, intJ1) + i - intI1 + j - intJ1 - 1
  63.  
  64.              arrDistance(i + 1, j + 1) = intMini
  65.          Next
  66.  
  67.          arrDA(arrStr1(i - 1)) = i
  68.      Next
  69.  
  70.      DamerauLevenshtein = arrDistance(intLen1 + 1, intLen2 + 1)
  71.  End Function
Edit: as always, I appreciate any feedback you have on how to improve my question or communicate more effectively. And thank you for any suggestions you might have on how to speed this up :)
Jan 20 '16 #1

✓ answered by Rabbit

It may be quicker if you write out the results of the query to a temp table and then do the min query on that.

2 1061
Rabbit
12,516 Expert Mod 8TB
It may be quicker if you write out the results of the query to a temp table and then do the min query on that.
Jan 21 '16 #2
That's a great idea, thank you!
Jan 21 '16 #3

Sign in to post your reply or Sign up for a free account.

Similar topics

0
by: Bennett Haselton | last post by:
I have a MySQL query running inside a CGI script on my site that, at random intervals, seems to take 10-20 seconds to complete instead of less than 1 second. I spent so much time trying to track...
3
by: David | last post by:
Consider this SQL Query: ----------------------------------------------------------------- SELECT c.CASE_NBR, DATEDIFF(d, c.CREATE_DT, GETDATE()) AS Age, c.AFFD_RCVD, c.PRV_CRD_ISS, x.RegE,...
3
by: John D | last post by:
We have a dynamic SP that dependant on a user name will run a selected tailored to them. One of the criteria is the number of rows retrieved, which we include using 'top @varNoOfRows' in the...
9
by: mfyahya | last post by:
Hi, I'm new to databases :) I need help speeding up select queries on my data which are currently taking 4-5 seconds. I set up a single large table of coordinates data with an index on the fields...
1
by: Robert Wille | last post by:
I have a number of very common queries that the optimizer plans a very inefficient plan for. I am using postgres 7.2.3. I vacuum hourly. I'm wonderingwhat I can do to make the queries faster. Here...
2
by: Wayne | last post by:
I am running a complex query using about 25 criteria that are entered on a query form. If any individual criteria isn't required its field is left as "*" on the form. When I run the query the...
12
by: strict9 | last post by:
Hello all, I'm writing several queries which need to do various string formating, including changing a phone number from (123) 456-7890. After some problem with data mismatches, I finally got it...
15
by: Jean | last post by:
Hello, I have the following query that I set up as a test, and it runs fine: SELECT STATUSHISTORIE.* FROM STATUSHISTORIE LEFT JOIN PROBLEM_DE ON STATUSHISTORIE.PROBLEM_ID =...
11
by: Dan Sugalski | last post by:
Is there any good way to speed up SQL that uses like and has placeholders? Here's the scoop. I've got a system that uses a lot of pre-generated SQL with placeholders in it. At runtime these SQL...
9
by: Bob Darlington | last post by:
The following query opens slowly the first time it is opened (6-7 seconds), but then is less than one second for the next random number of openings before slowing (6-7 seconds) again. SELECT...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.