I've probably not given the best title to this topic, but that
reflects my relative "newbie" status.
I have a table that goes essentially
TS DATETIME
jobnumber VARCHAR
jobentry VARCHAR
...
the TS is a time stamp, and the other two fields are job number
and entries. There are (many) more fields, but this is the core of
it. The relationship is there will be several entries per job, with
one row in the table per entry (i.e several rows per job).
In constructing a web interface I want to create a list of recent
job numbers, and summarize it broadly as follows
max(ts)
jobnumber
count(jobentry)
...
I can do this by a select command as follows
select top 30 max(ts) as time, jobnumber, count(jobentry)
from Jobs
group by jobnumber
order by time desc
However I'm now finding that this is quite slow now that my teat table
has around 400,000 entries (Even though I've added indexes to most
relevant fields). In particular it's much slower than
select top 30 jobnumber
from Jobs
group by jobnumber
order by jobnumber desc
leading me to suspect that the aggregate functions are slowing this
down. I would guesstimate the difference in speed is around a factor
of 10-20.
As I type this I think I've just realised it's the
order by time desc
that is probably causing the problem (in that it will need to
calculate max(ts) for all jobnumber's in the table in order to
execute the "order by" clause).
I think I can see the solution now (amazing what typing out a long
question can do for you :-)
My question was going to be if there was any approved method/tricks
for avoiding this sort performance hit.
It seemed to me that if I could first grab a decent number of recent
records (quickly) I could then execute the aggregate functions against
that smaller set of records. I know in broad terms how any entries
there can be per job, so I could select enough to make at least 30
jobs, and then execute the real query against that. In my case there
will be probably less than 10 entries per job, so I could grab 300
records and execute against that, instead of against the whole
400,000.
That being the case is this best done
a) as a subquery, and if so, how
b) by creating a temporary intermediate table (which is
in essence a more explicit version of (a) isn't it?)
Another solution that occurred was
c) to create and maintain a summary table, with just
one record per jobnumber with a view to having it
serve this particular (common) enquiry.
For (c) to work I would probably need triggers on the Jobs table to
maintain the summary table. This would probably represent more
overhead overall, but should certainly give a better response
when the query outlined above is executed on the web site.
The response time to this query is becoming critical on the web
interface to avoid timeouts.
Any suggestions or comments are welcome. If it's RTFM, then I'd
appreciate page numbers :-)
Thanks for reading this far :-)
--
HTML-to-text and markup removal with Detagger
http://www.jafsoft.com/detagger/