Hello, I'm new to creating databases and coding in general. Through research I have created a DB that works. However, I have filled it with test data and find that a few queries which are used to generate reports run slowly.
Each Query uses 1 or 2 main tables. One of the two tables holds 73,000 records and the other table holds 2,000,000 records.
A query that works directly on the data found in those two tables takes 30 seconds to execute in order to display 73,000 records. A second query that uses this main query takes roughly the same amount of time, while a third query takes roughly a minute. Both of the latter queries also display the 73000 records.
The queries must return data for all 73,000 records in the first table, while meeting criteria for the 2,000,000 records in the 2nd table.
I am using access 2007. With my lack of experience I do not know if I should consider these kinds of times excessive or not. This would be the time users of the front end would have to wait for their reports.
I anticipate that it would take roughly 10 years to fill the DB with that much data through normal use. So technically, if I archive the data and start fresh every few years it should never get that slow. Still, I would like to know if i'm doing things in a stupid way and if there are better ways to accomplish what I need.
If those times are not excessive, please save yourself some reading time by stopping here. Thank you in advance for your response. If however, those times stink, I shall post additional details for the courageous.
The purpose of the DB is to allow employees to log productivity. They enter information through a frontend, and can also view reports etc. The first main table is called M_Associates_Daysworked with fields(this one has the 73000 records):
Daysworked_ID(autonum), Associate_ID(number,foreign key), Work_Date(Date/time), Hours_worked(number), Override(yes/no), and then two number fields Registration and Ambulances. There is one record for every day an employee works, and Associate_ID and Work_Date make up a composite key. The second main table is called M_Associates_Productivity (this is the one with 2 million records) and has fields:
Associate_ID(Foreign key), Daysworked_ID(Foreign key), Account_Number(text 10 characters). The following 4 fields are Yes/No data type for employees to indicate the kind of work performed on each Account_Number.
The first query (Daily_Prod1) is supposed to count account numbers from table M_Associates_Productivity for each criteria and perform calculations based on that. This is the SQL for that: - SELECT M_Associates_Daysworked.Associate_ID, M_Associates_Daysworked.Work_Date, Abs(Sum(M_Associates_Productivity.[Registration])) AS Registrations, Abs(Sum(M_Associates_Productivity.[Ambulance])) AS Ambulances, M_Associates_Daysworked.Hours_Worked, Round(([Registrations]+(Nz([Ambulances])/2)+(Nz([Interviewed])/2+Nz([Keyed])/2))/[Hours_Worked],2) AS Productivity, Sum(M_Associates_Productivity.Copay) AS Collections, Abs(Sum([M_Associates_Productivity].[Interviewed])) AS Interviewed, Abs(Sum([M_Associates_Productivity].[Keyed])) AS Keyed, M_Associates_Daysworked.Daysworked_ID
-
FROM M_Associates_Daysworked LEFT JOIN M_Associates_Productivity ON M_Associates_Daysworked.Daysworked_ID = M_Associates_Productivity.Daysworked_ID
-
GROUP BY M_Associates_Daysworked.Associate_ID, M_Associates_Daysworked.Work_Date, M_Associates_Daysworked.Hours_Worked, M_Associates_Daysworked.Daysworked_ID;
-
The second Query (Daily_Prod2) is supposed to retrieve records from M_Associates_Daysworked where the value of "Override" is -1 (Meaning I am trying to override the input found in M_Associates_Productivity with my own). The code here is: - SELECT M_Associates_Daysworked.Associate_ID, M_Associates_Daysworked.Work_Date, M_Associates_Daysworked.Registrations, M_Associates_Daysworked.Ambulances, M_Associates_Daysworked.Hours_Worked, Round(([Registrations]+(Nz([Ambulances])/2))/[Hours_Worked],2) AS Productivity, M_Associates_Daysworked.Daysworked_ID
-
FROM M_Associates_Daysworked
-
WHERE (((M_Associates_Daysworked.Override)=-1))
-
GROUP BY M_Associates_Daysworked.Associate_ID, M_Associates_Daysworked.Work_Date, M_Associates_Daysworked.Registrations, M_Associates_Daysworked.Ambulances, M_Associates_Daysworked.Hours_Worked, Round(([Registrations]+(Nz([Ambulances])/2))/[Hours_Worked],2), M_Associates_Daysworked.Daysworked_ID
-
HAVING (((M_Associates_Daysworked.Work_Date) Between Date() And DateAdd("yyyy",-2,Date())));
-
The third query (Associates_Productivity) retrieves values for every day worked. If the value of override is -1 for a record, it uses the value found in query Daily_Prod2 . If the value is not -1, it uses the value found in query Daily_Prod1. It is this query that is used as a basis for reports or for one or two other queries for data purposes. Code here is: - SELECT M_Associates_Daysworked.Associate_ID, M_Associates_Daysworked.Work_Date, IIf([Override]=-1,[Daily_Prod2]![Registrations],[Daily_Prod1]![Registrations]) AS Registrations, IIf([Override]=-1,[Daily_Prod2]![Ambulances],[Daily_Prod1]![Ambulances]) AS Ambulances, M_Associates_Daysworked.Hours_Worked, IIf([Override]=-1,[Daily_Prod2]![Productivity],[Daily_Prod1]![Productivity]) AS Productivity, Daily_Prod1.Collections, IIf([Override]=-1,"",[Daily_Prod1]![Keyed]) AS Keyed, IIf([Override]=-1,"",[Daily_Prod1]![Interviewed]) AS Interviewed
-
FROM (M_Associates_Daysworked LEFT JOIN Daily_Prod1 ON M_Associates_Daysworked.Daysworked_ID = Daily_Prod1.Daysworked_ID) LEFT JOIN Daily_Prod2 ON M_Associates_Daysworked.Daysworked_ID = Daily_Prod2.Daysworked_ID
-
ORDER BY M_Associates_Daysworked.Work_Date DESC;
-
My initial thought was that the IIF statements in this query were bogging things down, but I am at the limit of my current knowledge and skill level and am unsure.
If this is a cause for slowness, is there a better way to do this?
If you took the time to read all this, thank you. I also apologize profusely for its length. Any feedback will be taken happily.
As each record source used either has an index of [DaysWorked_ID], or is a query built on [DaysWorked_ID], I think you have the indexing point covered well enough.
The issue in Access of queries executing for noticeably long periods of time is not something you can generally avoid simply by various (and sometimes clever) optimisations. These can often help, but avoiding them completely is unattainable in any serious database (and yours gives every impression of being that). What I would suggest as a way of covering this issue is rather to ensure the operator is informed of what's going on so that they are comfortable with the concept, and are never tempted to break into the process due to ignorance and the fear that the process has failed when it doesn't respond in a timely manner. If you think this makes sense then please visit Progress Indicator in Access.
14 12060 NeoPa 32,556
Expert Mod 16PB
Let me first say how surprised I was, when I checked your post-count (generally an indicator, though rough, of how experienced a poster a member is), to see it read only one post. I could easily have expected a much larger number and congratulate you on a very well laid out question. Such testing of your hypotheses so early on in the project is a very good sign, in my view, that you approach the subject seriously. Always gratifying to people like us answering questions.
In fact, I don't feel the times you report are at all excessive. That said, the fastidious way that you've approached the problem, and laid it out as well, interested me enough to read to the end.
I'll look at each query in turn and see if there are any comments to make (I may reformat the SQL for purposes of readability) : [Daily_Prod1] - SELECT tDW.Associate_ID
-
, tDW.Work_Date
-
, Abs(Sum(tP.Registration)) AS Registrations
-
, Abs(Sum(tP.Ambulance)) AS Ambulances
-
, tDW.Hours_Worked
-
, Round(([Registrations]+(Nz([Ambulances])/2)+(Nz([Interviewed])/2+Nz([Keyed])/2))/[Hours_Worked],2) AS Productivity
-
, Sum(tP.Copay) AS Collections
-
, Abs(Sum(tP.Interviewed)) AS Interviewed
-
, Abs(Sum(tP.Keyed)) AS Keyed
-
, tDW.Daysworked_ID
-
-
FROM M_Associates_Daysworked AS tDW
-
LEFT JOIN
-
M_Associates_Productivity AS tP
-
ON tDW.Daysworked_ID = tP.Daysworked_ID
-
-
GROUP BY tDW.Associate_ID
-
, tDW.Work_Date
-
, tDW.Hours_Worked
-
, tDW.Daysworked_ID
I would be surprised if line #6 didn't give you ambiguous reference messages. If it doesn't that would imply to me that the [Keyed] & [Interviewed] values you're expecting to use (presumably those defined in lines #8 & #9) are not actually the ones used. Notice in lines #3 & #4, the aliases defined don't clash (cause ambiguous references to) the fields from the table. [Daily_Prod2] - SELECT Associate_ID
-
, Work_Date
-
, Registrations
-
, Ambulances
-
, Hours_Worked
-
, Round(([Registrations]+(Nz([Ambulances])/2))/[Hours_Worked],2) AS Productivity
-
, Daysworked_ID
-
-
FROM M_Associates_Daysworked
-
-
WHERE ([Override] = -1)
-
-
GROUP BY Associate_ID
-
, Work_Date
-
, Registrations
-
, Ambulances
-
, Hours_Worked
-
, Round(([Registrations]+(Nz([Ambulances])/2))/[Hours_Worked],2)
-
, Daysworked_ID
-
-
HAVING ([Work_Date] Between Date() And DateAdd('yyyy',-2,Date())
This looks like SQL created for you by Access. The HAVING clause is for aggregated data, but Access is not fussy when it handles that, so you may want to move that to your WHERE clause (as [Work_Date] is not an aggregated result). I admit also, to being a little confused about fields such as [Registrations]. In [Daily_Prod1] this is aggregated from tP.Registration. Is it found separately in [M_Associates_Daysworked]? Would be strange I feel. [Associates_Productivity] - SELECT tDW.Associate_ID
-
, tDW.Work_Date
-
, IIf([Override]=-1,qDP2.Registrations,qDP1.Registrations) AS Registrations
-
, IIf([Override]=-1,qDP2.Ambulances,qDP1.Ambulances) AS Ambulances
-
, tDW.Hours_Worked
-
, IIf([Override]=-1,qDP2.Productivity,qDP1.Productivity) AS Productivity
-
, qDP1.Collections
-
, IIf([Override]=-1,"",qDP1.Keyed) AS Keyed
-
, IIf([Override]=-1,"",qDP1.Interviewed) AS Interviewed
-
-
FROM (M_Associates_Daysworked AS tDW
-
LEFT JOIN
-
Daily_Prod1 AS qDP1
-
ON tDW.Daysworked_ID = qDP1.Daysworked_ID)
-
LEFT JOIN
-
Daily_Prod2 AS qDP2
-
ON tDW.Daysworked_ID = qDP2.Daysworked_ID
-
-
ORDER BY tDW.Work_Date DESC
Nothing too much to say about this. Seems a pretty efficient way to link the threads together.
I only skimmed the question so pardon me if you already answered it somewhere in your question but what indexes did you create for your tables?
NeoPa 32,556
Expert Mod 16PB
I didn't cover that very important point Rabbit. It's good that you have :-)
I just worked with a client who needed to search 20 million+ records in multiple tables and counting. We tried everything to make it run at a decent speed. We finally gave up and made a utility that ran at regular intervals and created a index table to use for searching instead. Joining massive tables with complicated search patterns is always a tough process. I only ask you to consider that if it is at all applicable if you don't find another solution.
Let me first of all thank you for your responses. For the first query I was fairly reckless in choosing names for the aliases, and now that you mention it I'm wondering why I didn't get the error you mentioned. I'll take a look at that to see if I can prevent future issues.
The fields Registrations and Ambulances do exist in the table M_Associates_Daysworked, and are also aggregated in the queries based on the table M_Associates_Productivity.
When employees perform work related to an account number, they would enter the account number(encounter_number) and indicate whether it was a "registration", "ambulance", "Interview" or "data entry(keyed)" by using a checkbox. This data is stored in the table M_Associates_Productivity. The count from the queries represents the totals of their input. I needed the account numbers as a way to verify their work.
The fields in table M_Associates_Daysworked are a way for management to override their input, or enter input on their behalf without the tedium of entering a multitude of account numbers. They would simply enter the total on a form. I should probably rename those table fields to prevent confusion in the future.
Indexes Table M_Associates_Daysworked:
Associate_ID, Daysworked_ID (Unique), Work_Date and then the indexes on the composite key [Associate_ID/Work_Date]
Are the separate indexes on Associate_ID and Work_Date redundant? Table M_Associates_Productivity:
Associate_ID, Daysworked_ID, Productivity_ID, Shift_ID then there is an index on a composite key made up of [Encounter_Number, Registration(cb), Ambulance(cb), Interviewed(cb), Keyed(cb)]
cb = check box
Initially that large composite key was supposed to prevent duplicate entries from being entered into the table. I later found a way to defeat that so I used code on the input form to serve that purpose. Due to this I am not sure I still need that composite key.
Productivity_ID was the autonumber field access initially gave for the table. It was supposed to be a stable index to use in table relationships. It is not currently being used in any relationships.
The index Shift_ID refers to the primary key for a static table with 4 entries. I did not include it previously since I didn't consider that it may have an impact.
NeoPa 32,556
Expert Mod 16PB
As each record source used either has an index of [DaysWorked_ID], or is a query built on [DaysWorked_ID], I think you have the indexing point covered well enough.
The issue in Access of queries executing for noticeably long periods of time is not something you can generally avoid simply by various (and sometimes clever) optimisations. These can often help, but avoiding them completely is unattainable in any serious database (and yours gives every impression of being that). What I would suggest as a way of covering this issue is rather to ensure the operator is informed of what's going on so that they are comfortable with the concept, and are never tempted to break into the process due to ignorance and the fear that the process has failed when it doesn't respond in a timely manner. If you think this makes sense then please visit Progress Indicator in Access.
NeoPa: I am trying my best with this database, and i'm thankful for your suggestion. I really had no idea I could use a progress indicator. I believe this, along with informing the users, would be an effective solution. I will work on implementing this.
Netdynamic & Rabbit: Thank you two also for your efforts regarding this.
Everyone else: Thanks for taking a look at my problem as well.
Case closed for me.
NeoPa 32,556
Expert Mod 16PB
I'm very pleased you found that helpful. It's been a pleasure working with you on this.
If you find this approach interesting, another article, by a friend of mine, may also prove interesting - How to Use a Progress Meter in Access. You can choose which approach you prefer.
Here's a few more indexes you may want to create to speed it up a bit.
Hours_Worked, Override, Registrations, Ambulances
Basically, if you're going to join on it, filter it, or group by it, you want to create an index on it to speed up querying.
NeoPa 32,556
Expert Mod 16PB
I'm not sure it's as simple as that Rabbit. Part of the optimisation process of any query is to determine the single most appropriate available index to use. Certainly, the issues you mention would come into the making of that choice, but my understanding at least, is that only one index is used for the query (for each record source to be specific). In this case I would expect everything to work from the [DaysWorked_ID] indices.
How the engine optimizes depends on the specific implementation but it should be able to use multiple indexes for a single record source.
What follows is from SQL Server but I assume it would work similarly in Access. The reason the example is in SQL Server is because I don't think Access has explain plan functionality.
For the query - SET SHOWPLAN_ALL ON
-
go
-
-
select count(*) from defect group by role1, role2;
-
go
-
-
set showplan_all off
-
go
The generated plan shows that it scans both indexes for the one record source. - select count(*) from defect group by role1, role2;
-
|--Compute Scalar(DEFINE:([Expr1002]=Convert([Expr1005])))
-
|--Hash Match(Aggregate, HASH:([Defect].[role1], [Defect].[role2]), RESIDUAL:([Defect].[role1]=[Defect].[role1] AND [Defect].[role2]=[Defect].[role2]) DEFINE:([Expr1005]=COUNT(*)))
-
|--Hash Match(Inner Join, HASH:([Bmk1000])=([Bmk1000]), RESIDUAL:([Bmk1000]=[Bmk1000]))
-
|--Index Scan(OBJECT:([Defect].[ix_16782040]))
-
|--Index Scan(OBJECT:([Defect].[ix_16782060]))
NeoPa 32,556
Expert Mod 16PB
Interesting. I don't claim to read this but I can see that it appears to refer to two indices as you say.
I would infer from this that there are situations where the use of multiple indices can be beneficial. I would have thought these situations would be limited though. What do you think? I don't say for sure as I'm not well versed in such stuff. I generally try to work from the basic principles I do understand and extrapolate from there. It doesn't generally lead me too far astray. I would expect the abilities of modern optimisation processes to outstrip my understandings though, so there will be things I don't anticipate.
It's hard to say what's best. I assume two separate indexes would be better than only 1 index. However, the best case scenario would be to create a composite index consisting of all the fields used in a query. I think this would be the quickest but also the largest and slowest to update.
I suspect though that what's causing most of the delay is the grouping by that expression. And you really can't index that.
Sign in to post your reply or Sign up for a free account.
Similar topics
by: Ray Gurganus |
last post by:
I'm running MySQL 4.0.16 on Windows 2003. I just added the mysqld-nt
command line option to enable the slow query log, started MySQL, and the
option showed up as turned on. Then later I restarted...
|
by: Wouter |
last post by:
Hi,
I was wandering.
Does the query speed also depend on the colums that you don't use in the
query (so not in the where and not in the select part).
I have a lange unused backup field in...
|
by: Robert |
last post by:
I am having performance issues on a SQL query in Access. My query is
accessing and joining several tables (one very large one). The tables are
linked ODBC. The client submits the query to the...
|
by: Gary Wales |
last post by:
We have two main query types running against a table of some 2 million
rows and have gotten query response down to well under a second by
using the right indexes.
Problem is that we are running...
|
by: Thomas Bartkus |
last post by:
Is it possible that the elapsed time being measured includes waiting for the
client to acknowledge that it has received all the data? In *addition* to
the server execution time?
Documentation...
|
by: Not Me |
last post by:
Hi,
In the past I seem to have been able to speed up queries by making them a
join rather than ... not a join :o)
An example would be
select * from x where a in (select a from y)
would...
|
by: Stefan |
last post by:
The following query gives me the right output, but takes almost 10
minutes to run...
Any obvious things that I can improve on? I know this is diffuclt to
do without realizign what I'm trying to...
|
by: Dave Hammond |
last post by:
Hi All,
I'm trying to use the slow-query-log (with
--log-queries-not-using-indexes enabled) to determine if any queries
need optimization, and have a few questions about some entries I'm...
|
by: wizofaus |
last post by:
I previously posted about a problem where it seemed that changing the
case of the word "BY" in a SELECT query was causing it to run much much
faster.
Now I've hit the same thing again, where...
|
by: xoinki |
last post by:
hi all,
I wanted to know few general details about MySQL slow query log.
My slow log has a statement like
# User@Host: root @ localhost
here, what does this "" signify? why is this needed?...
|
by: taylorcarr |
last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
|
by: Charles Arthur |
last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
|
by: ryjfgjl |
last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
|
by: emmanuelkatto |
last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud.
Please let me know.
Thanks!
Emmanuel
|
by: BarryA |
last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
|
by: nemocccc |
last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
|
by: Hystou |
last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
|
by: Oralloy |
last post by:
Hello folks,
I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>".
The problem is that using the GNU compilers,...
|
by: Hystou |
last post by:
Overview:
Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
| |