John D (dobjsonneNOSPAMhotmail.com) writes:
Really? I would guess that the big change from 130 to 150 was due to
chaning from using an non-clustered index to using a table scan.
Yes ! Why is that?
That is what is happening, and is causing the biggest problem.
SQL Server uses a cost-based optimizer, which means that it tries
to find the cheapest way to evaluate a query. To compute the costs for
different query plans, it relies on statistics about the data. Statistics
are usually sampled automatically as you run queries, and it has to
scan the data anyway.
In this particular case, it appears that SQL Server has a choice between
using a non-clustered index and a table scan. Say that your query is
SELECT TOP 150 * FROM tbl ORDER BY col
And there is a non-clustered index on col. There are two possible query
plans here: on is to follow the index on col, and stop when you have found
150 rows, and another is to scan the table and then find 150 rows with
low values on col.
Why does SQL Server not always use the index? Because when using a non-
clustered index, SQL Server must access the data pages to get the data.
This means that if you retrieve many rows, you will access the same
page more than once, and in the end you get worse performance. So at
some point, swicthing to table scan is better. It appears that in this
case, SQL Server is making an incorrect guess on where the balance
point is. This could be because the statistics are not accurate enough,
and it's possible that an UPDATE STATISTICS WITH FULLSCAN on the table
could have some effect, although it may only push the limit in some
direction with resolving the problem. My experience is that SQL Server
is conservative about using non-clustered indexes.
If you instead made the index clustered, you problems should disappear
In a clustered index, the leaf level of the index is the data, so
there are no extra access to data pages. If the query actually is
SELECT * FROM tbl WHERE othercol = @value ORDER BY col
then the clustered index should have othercol as its first column.
(I should add that there is one exception about non-clustered indexes:
if the index includes all columns in the query, there is no need to
access the data pages, and such a covered query can be quite effective.)
Is it best to create the indexes out of office hours? I'm worried that
creating them will take up considerable space on the server, and/or slow
the server down whilst insert/updates/deletes ??
Indexes do incure a toll on INSERT, DELETE and UPDATEs that is true.
Usually this toll is acceptable and in fact not even noticeable. But
you should avoid putting the clustered index on columns that often
changes value, since if the clustered key changes, not only must the
data move, but all non-clustered indexes are affected, because the
pointer to the data pages from the non-clustered indexes is in fact the
clustered key.
As for when to create indexes, it is usually good idea do to it outside
peak times. particularly if you change or add a clustered index, because
the table is basically offline for the duration of the CREATE INDEX
statement. Non-clustered index usually take less time to create.
--
Erland Sommarskog, SQL Server MVP,
es****@sommarskog.se
Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinf...2000/books.asp