By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
435,446 Members | 3,120 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 435,446 IT Pros & Developers. It's quick & easy.

How to optimize a slow query?

P: 6
Hello, I'm new to creating databases and coding in general. Through research I have created a DB that works. However, I have filled it with test data and find that a few queries which are used to generate reports run slowly.

Each Query uses 1 or 2 main tables. One of the two tables holds 73,000 records and the other table holds 2,000,000 records.

A query that works directly on the data found in those two tables takes 30 seconds to execute in order to display 73,000 records. A second query that uses this main query takes roughly the same amount of time, while a third query takes roughly a minute. Both of the latter queries also display the 73000 records.

The queries must return data for all 73,000 records in the first table, while meeting criteria for the 2,000,000 records in the 2nd table.

I am using access 2007. With my lack of experience I do not know if I should consider these kinds of times excessive or not. This would be the time users of the front end would have to wait for their reports.

I anticipate that it would take roughly 10 years to fill the DB with that much data through normal use. So technically, if I archive the data and start fresh every few years it should never get that slow. Still, I would like to know if i'm doing things in a stupid way and if there are better ways to accomplish what I need.

If those times are not excessive, please save yourself some reading time by stopping here. Thank you in advance for your response. If however, those times stink, I shall post additional details for the courageous.



The purpose of the DB is to allow employees to log productivity. They enter information through a frontend, and can also view reports etc.

The first main table is called M_Associates_Daysworked with fields(this one has the 73000 records):

Daysworked_ID(autonum), Associate_ID(number,foreign key), Work_Date(Date/time), Hours_worked(number), Override(yes/no), and then two number fields Registration and Ambulances. There is one record for every day an employee works, and Associate_ID and Work_Date make up a composite key.

The second main table is called M_Associates_Productivity (this is the one with 2 million records) and has fields:

Associate_ID(Foreign key), Daysworked_ID(Foreign key), Account_Number(text 10 characters). The following 4 fields are Yes/No data type for employees to indicate the kind of work performed on each Account_Number.

The first query (Daily_Prod1) is supposed to count account numbers from table M_Associates_Productivity for each criteria and perform calculations based on that. This is the SQL for that:

Expand|Select|Wrap|Line Numbers
  1. SELECT M_Associates_Daysworked.Associate_ID, M_Associates_Daysworked.Work_Date, Abs(Sum(M_Associates_Productivity.[Registration])) AS Registrations, Abs(Sum(M_Associates_Productivity.[Ambulance])) AS Ambulances, M_Associates_Daysworked.Hours_Worked, Round(([Registrations]+(Nz([Ambulances])/2)+(Nz([Interviewed])/2+Nz([Keyed])/2))/[Hours_Worked],2) AS Productivity, Sum(M_Associates_Productivity.Copay) AS Collections, Abs(Sum([M_Associates_Productivity].[Interviewed])) AS Interviewed, Abs(Sum([M_Associates_Productivity].[Keyed])) AS Keyed, M_Associates_Daysworked.Daysworked_ID
  2. FROM M_Associates_Daysworked LEFT JOIN M_Associates_Productivity ON M_Associates_Daysworked.Daysworked_ID = M_Associates_Productivity.Daysworked_ID
  3. GROUP BY M_Associates_Daysworked.Associate_ID, M_Associates_Daysworked.Work_Date, M_Associates_Daysworked.Hours_Worked, M_Associates_Daysworked.Daysworked_ID;
  4.  

The second Query (Daily_Prod2) is supposed to retrieve records from M_Associates_Daysworked where the value of "Override" is -1 (Meaning I am trying to override the input found in M_Associates_Productivity with my own). The code here is:

Expand|Select|Wrap|Line Numbers
  1. SELECT M_Associates_Daysworked.Associate_ID, M_Associates_Daysworked.Work_Date, M_Associates_Daysworked.Registrations, M_Associates_Daysworked.Ambulances, M_Associates_Daysworked.Hours_Worked, Round(([Registrations]+(Nz([Ambulances])/2))/[Hours_Worked],2) AS Productivity, M_Associates_Daysworked.Daysworked_ID
  2. FROM M_Associates_Daysworked
  3. WHERE (((M_Associates_Daysworked.Override)=-1))
  4. GROUP BY M_Associates_Daysworked.Associate_ID, M_Associates_Daysworked.Work_Date, M_Associates_Daysworked.Registrations, M_Associates_Daysworked.Ambulances, M_Associates_Daysworked.Hours_Worked, Round(([Registrations]+(Nz([Ambulances])/2))/[Hours_Worked],2), M_Associates_Daysworked.Daysworked_ID
  5. HAVING (((M_Associates_Daysworked.Work_Date) Between Date() And DateAdd("yyyy",-2,Date())));
  6.  
The third query (Associates_Productivity) retrieves values for every day worked. If the value of override is -1 for a record, it uses the value found in query Daily_Prod2 . If the value is not -1, it uses the value found in query Daily_Prod1. It is this query that is used as a basis for reports or for one or two other queries for data purposes. Code here is:

Expand|Select|Wrap|Line Numbers
  1. SELECT M_Associates_Daysworked.Associate_ID, M_Associates_Daysworked.Work_Date, IIf([Override]=-1,[Daily_Prod2]![Registrations],[Daily_Prod1]![Registrations]) AS Registrations, IIf([Override]=-1,[Daily_Prod2]![Ambulances],[Daily_Prod1]![Ambulances]) AS Ambulances, M_Associates_Daysworked.Hours_Worked, IIf([Override]=-1,[Daily_Prod2]![Productivity],[Daily_Prod1]![Productivity]) AS Productivity, Daily_Prod1.Collections, IIf([Override]=-1,"",[Daily_Prod1]![Keyed]) AS Keyed, IIf([Override]=-1,"",[Daily_Prod1]![Interviewed]) AS Interviewed
  2. FROM (M_Associates_Daysworked LEFT JOIN Daily_Prod1 ON M_Associates_Daysworked.Daysworked_ID = Daily_Prod1.Daysworked_ID) LEFT JOIN Daily_Prod2 ON M_Associates_Daysworked.Daysworked_ID = Daily_Prod2.Daysworked_ID
  3. ORDER BY M_Associates_Daysworked.Work_Date DESC;
  4.  

My initial thought was that the IIF statements in this query were bogging things down, but I am at the limit of my current knowledge and skill level and am unsure.

If this is a cause for slowness, is there a better way to do this?

If you took the time to read all this, thank you. I also apologize profusely for its length. Any feedback will be taken happily.
Feb 21 '11 #1

✓ answered by NeoPa

As each record source used either has an index of [DaysWorked_ID], or is a query built on [DaysWorked_ID], I think you have the indexing point covered well enough.

The issue in Access of queries executing for noticeably long periods of time is not something you can generally avoid simply by various (and sometimes clever) optimisations. These can often help, but avoiding them completely is unattainable in any serious database (and yours gives every impression of being that). What I would suggest as a way of covering this issue is rather to ensure the operator is informed of what's going on so that they are comfortable with the concept, and are never tempted to break into the process due to ignorance and the fear that the process has failed when it doesn't respond in a timely manner. If you think this makes sense then please visit Progress Indicator in Access.

Share this Question
Share on Google+
14 Replies


NeoPa
Expert Mod 15k+
P: 31,494
Let me first say how surprised I was, when I checked your post-count (generally an indicator, though rough, of how experienced a poster a member is), to see it read only one post. I could easily have expected a much larger number and congratulate you on a very well laid out question. Such testing of your hypotheses so early on in the project is a very good sign, in my view, that you approach the subject seriously. Always gratifying to people like us answering questions.

In fact, I don't feel the times you report are at all excessive. That said, the fastidious way that you've approached the problem, and laid it out as well, interested me enough to read to the end.

I'll look at each query in turn and see if there are any comments to make (I may reformat the SQL for purposes of readability) :

[Daily_Prod1]
Expand|Select|Wrap|Line Numbers
  1. SELECT   tDW.Associate_ID
  2.        , tDW.Work_Date
  3.        , Abs(Sum(tP.Registration)) AS Registrations
  4.        , Abs(Sum(tP.Ambulance)) AS Ambulances
  5.        , tDW.Hours_Worked
  6.        , Round(([Registrations]+(Nz([Ambulances])/2)+(Nz([Interviewed])/2+Nz([Keyed])/2))/[Hours_Worked],2) AS Productivity
  7.        , Sum(tP.Copay) AS Collections
  8.        , Abs(Sum(tP.Interviewed)) AS Interviewed
  9.        , Abs(Sum(tP.Keyed)) AS Keyed
  10.        , tDW.Daysworked_ID
  11.  
  12. FROM     M_Associates_Daysworked AS tDW
  13.          LEFT JOIN
  14.          M_Associates_Productivity AS tP
  15.   ON     tDW.Daysworked_ID = tP.Daysworked_ID
  16.  
  17. GROUP BY tDW.Associate_ID
  18.        , tDW.Work_Date
  19.        , tDW.Hours_Worked
  20.        , tDW.Daysworked_ID
I would be surprised if line #6 didn't give you ambiguous reference messages. If it doesn't that would imply to me that the [Keyed] & [Interviewed] values you're expecting to use (presumably those defined in lines #8 & #9) are not actually the ones used. Notice in lines #3 & #4, the aliases defined don't clash (cause ambiguous references to) the fields from the table.

[Daily_Prod2]
Expand|Select|Wrap|Line Numbers
  1. SELECT   Associate_ID
  2.        , Work_Date
  3.        , Registrations
  4.        , Ambulances
  5.        , Hours_Worked
  6.        , Round(([Registrations]+(Nz([Ambulances])/2))/[Hours_Worked],2) AS Productivity
  7.        , Daysworked_ID
  8.  
  9. FROM     M_Associates_Daysworked
  10.  
  11. WHERE    ([Override] = -1)
  12.  
  13. GROUP BY Associate_ID
  14.        , Work_Date
  15.        , Registrations
  16.        , Ambulances
  17.        , Hours_Worked
  18.        , Round(([Registrations]+(Nz([Ambulances])/2))/[Hours_Worked],2)
  19.        , Daysworked_ID
  20.  
  21. HAVING   ([Work_Date] Between Date() And DateAdd('yyyy',-2,Date())
This looks like SQL created for you by Access. The HAVING clause is for aggregated data, but Access is not fussy when it handles that, so you may want to move that to your WHERE clause (as [Work_Date] is not an aggregated result). I admit also, to being a little confused about fields such as [Registrations]. In [Daily_Prod1] this is aggregated from tP.Registration. Is it found separately in [M_Associates_Daysworked]? Would be strange I feel.

[Associates_Productivity]
Expand|Select|Wrap|Line Numbers
  1. SELECT   tDW.Associate_ID
  2.        , tDW.Work_Date
  3.        , IIf([Override]=-1,qDP2.Registrations,qDP1.Registrations) AS Registrations
  4.        , IIf([Override]=-1,qDP2.Ambulances,qDP1.Ambulances) AS Ambulances
  5.        , tDW.Hours_Worked
  6.        , IIf([Override]=-1,qDP2.Productivity,qDP1.Productivity) AS Productivity
  7.        , qDP1.Collections
  8.        , IIf([Override]=-1,"",qDP1.Keyed) AS Keyed
  9.        , IIf([Override]=-1,"",qDP1.Interviewed) AS Interviewed
  10.  
  11. FROM    (M_Associates_Daysworked AS tDW
  12.          LEFT JOIN
  13.          Daily_Prod1 AS qDP1
  14.   ON     tDW.Daysworked_ID = qDP1.Daysworked_ID)
  15.          LEFT JOIN
  16.          Daily_Prod2 AS qDP2
  17.   ON     tDW.Daysworked_ID = qDP2.Daysworked_ID
  18.  
  19. ORDER BY tDW.Work_Date DESC
Nothing too much to say about this. Seems a pretty efficient way to link the threads together.
Feb 21 '11 #2

Rabbit
Expert Mod 10K+
P: 12,366
I only skimmed the question so pardon me if you already answered it somewhere in your question but what indexes did you create for your tables?
Feb 21 '11 #3

NeoPa
Expert Mod 15k+
P: 31,494
I didn't cover that very important point Rabbit. It's good that you have :-)
Feb 21 '11 #4

P: 27
I just worked with a client who needed to search 20 million+ records in multiple tables and counting. We tried everything to make it run at a decent speed. We finally gave up and made a utility that ran at regular intervals and created a index table to use for searching instead. Joining massive tables with complicated search patterns is always a tough process. I only ask you to consider that if it is at all applicable if you don't find another solution.
Feb 22 '11 #5

P: 6
Let me first of all thank you for your responses. For the first query I was fairly reckless in choosing names for the aliases, and now that you mention it I'm wondering why I didn't get the error you mentioned. I'll take a look at that to see if I can prevent future issues.

The fields Registrations and Ambulances do exist in the table M_Associates_Daysworked, and are also aggregated in the queries based on the table M_Associates_Productivity.

When employees perform work related to an account number, they would enter the account number(encounter_number) and indicate whether it was a "registration", "ambulance", "Interview" or "data entry(keyed)" by using a checkbox. This data is stored in the table M_Associates_Productivity. The count from the queries represents the totals of their input. I needed the account numbers as a way to verify their work.

The fields in table M_Associates_Daysworked are a way for management to override their input, or enter input on their behalf without the tedium of entering a multitude of account numbers. They would simply enter the total on a form. I should probably rename those table fields to prevent confusion in the future.

Indexes

Table M_Associates_Daysworked:

Associate_ID, Daysworked_ID (Unique), Work_Date and then the indexes on the composite key [Associate_ID/Work_Date]

Are the separate indexes on Associate_ID and Work_Date redundant?

Table M_Associates_Productivity:

Associate_ID, Daysworked_ID, Productivity_ID, Shift_ID then there is an index on a composite key made up of [Encounter_Number, Registration(cb), Ambulance(cb), Interviewed(cb), Keyed(cb)]

cb = check box

Initially that large composite key was supposed to prevent duplicate entries from being entered into the table. I later found a way to defeat that so I used code on the input form to serve that purpose. Due to this I am not sure I still need that composite key.

Productivity_ID was the autonumber field access initially gave for the table. It was supposed to be a stable index to use in table relationships. It is not currently being used in any relationships.

The index Shift_ID refers to the primary key for a static table with 4 entries. I did not include it previously since I didn't consider that it may have an impact.
Feb 22 '11 #6

NeoPa
Expert Mod 15k+
P: 31,494
As each record source used either has an index of [DaysWorked_ID], or is a query built on [DaysWorked_ID], I think you have the indexing point covered well enough.

The issue in Access of queries executing for noticeably long periods of time is not something you can generally avoid simply by various (and sometimes clever) optimisations. These can often help, but avoiding them completely is unattainable in any serious database (and yours gives every impression of being that). What I would suggest as a way of covering this issue is rather to ensure the operator is informed of what's going on so that they are comfortable with the concept, and are never tempted to break into the process due to ignorance and the fear that the process has failed when it doesn't respond in a timely manner. If you think this makes sense then please visit Progress Indicator in Access.
Feb 22 '11 #7

P: 6
NeoPa: I am trying my best with this database, and i'm thankful for your suggestion. I really had no idea I could use a progress indicator. I believe this, along with informing the users, would be an effective solution. I will work on implementing this.

Netdynamic & Rabbit: Thank you two also for your efforts regarding this.

Everyone else: Thanks for taking a look at my problem as well.

Case closed for me.
Feb 22 '11 #8

NeoPa
Expert Mod 15k+
P: 31,494
I'm very pleased you found that helpful. It's been a pleasure working with you on this.

If you find this approach interesting, another article, by a friend of mine, may also prove interesting - How to Use a Progress Meter in Access. You can choose which approach you prefer.
Feb 22 '11 #9

Rabbit
Expert Mod 10K+
P: 12,366
Here's a few more indexes you may want to create to speed it up a bit.

Hours_Worked, Override, Registrations, Ambulances

Basically, if you're going to join on it, filter it, or group by it, you want to create an index on it to speed up querying.
Feb 22 '11 #10

NeoPa
Expert Mod 15k+
P: 31,494
I'm not sure it's as simple as that Rabbit. Part of the optimisation process of any query is to determine the single most appropriate available index to use. Certainly, the issues you mention would come into the making of that choice, but my understanding at least, is that only one index is used for the query (for each record source to be specific). In this case I would expect everything to work from the [DaysWorked_ID] indices.
Feb 22 '11 #11

Rabbit
Expert Mod 10K+
P: 12,366
How the engine optimizes depends on the specific implementation but it should be able to use multiple indexes for a single record source.

What follows is from SQL Server but I assume it would work similarly in Access. The reason the example is in SQL Server is because I don't think Access has explain plan functionality.

For the query
Expand|Select|Wrap|Line Numbers
  1. SET SHOWPLAN_ALL ON
  2. go
  3.  
  4. select count(*) from defect group by role1, role2;
  5. go
  6.  
  7. set showplan_all off
  8. go
The generated plan shows that it scans both indexes for the one record source.
Expand|Select|Wrap|Line Numbers
  1.   select count(*) from defect group by role1, role2;
  2.   |--Compute Scalar(DEFINE:([Expr1002]=Convert([Expr1005])))
  3.        |--Hash Match(Aggregate, HASH:([Defect].[role1], [Defect].[role2]), RESIDUAL:([Defect].[role1]=[Defect].[role1] AND [Defect].[role2]=[Defect].[role2]) DEFINE:([Expr1005]=COUNT(*)))
  4.             |--Hash Match(Inner Join, HASH:([Bmk1000])=([Bmk1000]), RESIDUAL:([Bmk1000]=[Bmk1000]))
  5.                  |--Index Scan(OBJECT:([Defect].[ix_16782040]))
  6.                  |--Index Scan(OBJECT:([Defect].[ix_16782060]))
Feb 22 '11 #12

NeoPa
Expert Mod 15k+
P: 31,494
Interesting. I don't claim to read this but I can see that it appears to refer to two indices as you say.

I would infer from this that there are situations where the use of multiple indices can be beneficial. I would have thought these situations would be limited though. What do you think? I don't say for sure as I'm not well versed in such stuff. I generally try to work from the basic principles I do understand and extrapolate from there. It doesn't generally lead me too far astray. I would expect the abilities of modern optimisation processes to outstrip my understandings though, so there will be things I don't anticipate.
Feb 22 '11 #13

Rabbit
Expert Mod 10K+
P: 12,366
It's hard to say what's best. I assume two separate indexes would be better than only 1 index. However, the best case scenario would be to create a composite index consisting of all the fields used in a query. I think this would be the quickest but also the largest and slowest to update.
Feb 22 '11 #14

Rabbit
Expert Mod 10K+
P: 12,366
I suspect though that what's causing most of the delay is the grouping by that expression. And you really can't index that.
Feb 22 '11 #15

Post your reply

Sign in to post your reply or Sign up for a free account.