Indexing strategy

Ryan

I have a bit of a problem with regards an indexing strategy. Well,
basically there is no indexing strategy on a set of data I have at
work. Now, I didn't create the design as I would have allowed for this.

OK, so there is primary key (clustered) indexes (mainly composite
keys), but no other indexes on the tables. As you would expect, the
performance leaves a lot to be desired. A hell of a lot. We have
several million rows in a lot of the tables. None of the queries seem
to be overly complex so we can work through the applications at a later
stage.

We have multiple copies (one per client per country) of the same
structure (I may have considered combining these to allow better
performance). One specific SP that I need to run takes 40+ hours
without indexes and 5 hours with some (130) basic indexes to get us
started on a better design. These 130 indexes are the minimum I suspect
we need and from there, we can start to work out which ones we need.

Now the test database (our worst performer) doubles in size to 20Gb,
but the performance is much much better (as expected). The original
thinking behind the design was for storage concerns (server space
recently upgraded) and for performance with bulk inserts.

We have a lot of bulk inserts, but I suspect that these are not too
bad, and the time taken for indexing these is negligable due to the
performance gains. I strongly suspect that we have a considerable
amount of table scans going on, but the problem is that people here
don't understand indexing (yet) or have the time (probably because it's
all taken up using the poorly designed system). That's a whole seperate
issue for me to address.

So, finally getting round to my questions...

Is there any good reference explaining in Layman's terms why you need
basic (or advanced) indexing ? Any links would be appreciated. I need
this to help explain to colleagues why a disk space increase and
indexing will be far better than spending thousands on a new box and
doing the same (a common problem I suspect).

How can I accurately estimate the amount of time taken to update an
index once data is bulk inserted. I don't want to re-index from scratch
as this may take too long. Indexing my database first time round takes
about 1 hour 30 minutes.

It's all part of an ongoing bit of digging into the system and re-doing
it to make it work properly. I'm sure most of you will have been there
at some point or another.

Thanks
Ryan

Jun 12 '06 #1

Subscribe Post Reply

1799

Stu

Hey Ryan,

I can't think of any layman's resources for indexing strategies, but I
know that Kalen Delaney's book "Inside SQL Server" has some very good
explanations of how things work under the hood (including indexes);
might be useful.

However, I am a bit perplexed as to the need for 130+ indexes being
affected by a single stored procedure; is this because of the redundant
data structure per client? Can you post some sample table structures?
It sounds as if the clustered index is actually a poor candidate given
the volume of data and the amount of time to perform a bulk insert.

Stu
Ryan wrote:

I have a bit of a problem with regards an indexing strategy. Well,
basically there is no indexing strategy on a set of data I have at
work. Now, I didn't create the design as I would have allowed for this.

OK, so there is primary key (clustered) indexes (mainly composite
keys), but no other indexes on the tables. As you would expect, the
performance leaves a lot to be desired. A hell of a lot. We have
several million rows in a lot of the tables. None of the queries seem
to be overly complex so we can work through the applications at a later
stage.

We have multiple copies (one per client per country) of the same
structure (I may have considered combining these to allow better
performance). One specific SP that I need to run takes 40+ hours
without indexes and 5 hours with some (130) basic indexes to get us
started on a better design. These 130 indexes are the minimum I suspect
we need and from there, we can start to work out which ones we need.

Now the test database (our worst performer) doubles in size to 20Gb,
but the performance is much much better (as expected). The original
thinking behind the design was for storage concerns (server space
recently upgraded) and for performance with bulk inserts.

We have a lot of bulk inserts, but I suspect that these are not too
bad, and the time taken for indexing these is negligable due to the
performance gains. I strongly suspect that we have a considerable
amount of table scans going on, but the problem is that people here
don't understand indexing (yet) or have the time (probably because it's
all taken up using the poorly designed system). That's a whole seperate
issue for me to address.

So, finally getting round to my questions...

Is there any good reference explaining in Layman's terms why you need
basic (or advanced) indexing ? Any links would be appreciated. I need
this to help explain to colleagues why a disk space increase and
indexing will be far better than spending thousands on a new box and
doing the same (a common problem I suspect).

How can I accurately estimate the amount of time taken to update an
index once data is bulk inserted. I don't want to re-index from scratch
as this may take too long. Indexing my database first time round takes
about 1 hour 30 minutes.

It's all part of an ongoing bit of digging into the system and re-doing
it to make it work properly. I'm sure most of you will have been there
at some point or another.

Thanks
Ryan

Jun 12 '06 #2

Erland Sommarskog

Ryan (ry********@hotmail.com) writes:

Is there any good reference explaining in Layman's terms why you need
basic (or advanced) indexing ? Any links would be appreciated. I need
this to help explain to colleagues why a disk space increase and
indexing will be far better than spending thousands on a new box and
doing the same (a common problem I suspect).
I echo Stu and recommend Kalen's book.

Understanding indexing correctly is absolute essential to get best
performance out of SQL Server (or any other RDBMS for that matter).
How can I accurately estimate the amount of time taken to update an
index once data is bulk inserted. I don't want to re-index from scratch
as this may take too long. Indexing my database first time round takes
about 1 hour 30 minutes.

For 130 indexes? That's not too bad.

In any case, the best way to find out the cost of an index for bulk
insert is to benchmark.
--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pro...ads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinf...ons/books.mspx

Jun 12 '06 #3

Ryan

It's 130 basic indexes across all of the tables. Some of which are used
in the SP. I examined all of the tables to work out the minimum, then
we can start working on a proper strategy. Example table :

if exists (select * from dbo.sysobjects where id =
object_id(N'[dbo].[DEALER_SOURCE_DATA_VALUES]') and OBJECTPROPERTY(id,
N'IsUserTable') = 1)
drop table [dbo].[DEALER_SOURCE_DATA_VALUES]
GO

CREATE TABLE [dbo].[DEALER_SOURCE_DATA_VALUES] (
[DEALER_SOURCE_DATA_ID] [int] NOT NULL ,
[FIELD_CODE] [varchar] (10) COLLATE SQL_Latin1_General_CP1_CI_AS NOT
NULL ,
[FIELD_VALUE] [numeric](15, 5) NOT NULL
) ON [PRIMARY]
GO

ALTER TABLE [dbo].[DEALER_SOURCE_DATA_VALUES] WITH NOCHECK ADD
CONSTRAINT [PK_DEALER_SOURCE_DATA_VALUES] PRIMARY KEY CLUSTERED
(
[DEALER_SOURCE_DATA_ID],
[FIELD_CODE]
) WITH FILLFACTOR = 90 ON [PRIMARY]
GO

This table has approx 25 million rows. I would have added indexes to
Dealer_Source_Data_Id and Field Code seperately. This table holds the
data ID which can be related back to a row in another table for the
year, month and Dealer ID. It also holds all lines and the value
against those lines. Quite often queries would be needed for everything
by the Dealer_Source_Data_Id hence the first index I would add. Also,
we want to sum (for example) all of a specific field_code, hence the
other index I would add.

We would normally have approx 4,000 unique Field_Codes per
Dealer_Source_Data_Id.

The PK is correct, but not good for querying what we need to as it
would only be used when we specify both Dealer_Source_Data_Id and
Field_Code in order to return a value. Correct ? When I want more, it
would force a table scan if I'm correct. Proper indexing should allow
us to only need to scan 4,000 rows max using one index and 6,250 rows
using the other.
Ryan

Stu wrote:

Hey Ryan,

I can't think of any layman's resources for indexing strategies, but I
know that Kalen Delaney's book "Inside SQL Server" has some very good
explanations of how things work under the hood (including indexes);
might be useful.

However, I am a bit perplexed as to the need for 130+ indexes being
affected by a single stored procedure; is this because of the redundant
data structure per client? Can you post some sample table structures?
It sounds as if the clustered index is actually a poor candidate given
the volume of data and the amount of time to perform a bulk insert.

Stu
Ryan wrote:
I have a bit of a problem with regards an indexing strategy. Well,
basically there is no indexing strategy on a set of data I have at
work. Now, I didn't create the design as I would have allowed for this.

OK, so there is primary key (clustered) indexes (mainly composite
keys), but no other indexes on the tables. As you would expect, the
performance leaves a lot to be desired. A hell of a lot. We have
several million rows in a lot of the tables. None of the queries seem
to be overly complex so we can work through the applications at a later
stage.

We have multiple copies (one per client per country) of the same
structure (I may have considered combining these to allow better
performance). One specific SP that I need to run takes 40+ hours
without indexes and 5 hours with some (130) basic indexes to get us
started on a better design. These 130 indexes are the minimum I suspect
we need and from there, we can start to work out which ones we need.

Now the test database (our worst performer) doubles in size to 20Gb,
but the performance is much much better (as expected). The original
thinking behind the design was for storage concerns (server space
recently upgraded) and for performance with bulk inserts.

We have a lot of bulk inserts, but I suspect that these are not too
bad, and the time taken for indexing these is negligable due to the
performance gains. I strongly suspect that we have a considerable
amount of table scans going on, but the problem is that people here
don't understand indexing (yet) or have the time (probably because it's
all taken up using the poorly designed system). That's a whole seperate
issue for me to address.

So, finally getting round to my questions...

Is there any good reference explaining in Layman's terms why you need
basic (or advanced) indexing ? Any links would be appreciated. I need
this to help explain to colleagues why a disk space increase and
indexing will be far better than spending thousands on a new box and
doing the same (a common problem I suspect).

How can I accurately estimate the amount of time taken to update an
index once data is bulk inserted. I don't want to re-index from scratch
as this may take too long. Indexing my database first time round takes
about 1 hour 30 minutes.

It's all part of an ongoing bit of digging into the system and re-doing
it to make it work properly. I'm sure most of you will have been there
at some point or another.

Thanks
Ryan

Jun 13 '06 #4

francois.bourdages

A good rule of thumb :
1) index all FK.

If you have time and money :
Analyse all your query, and you will be able to get advise on all the
index you need.

If you dont have time or money :
Start working on long query first, and optimize them. You will get big
result fast.

90% of all performance probleme i have seen in the past are from query.
(query itself or logic in the design) . Not network, not RAM, or CPU,
not disk.
So start there.

Jun 13 '06 #5

Erland Sommarskog

Ryan (ry********@hotmail.com) writes:

ALTER TABLE [dbo].[DEALER_SOURCE_DATA_VALUES] WITH NOCHECK ADD
CONSTRAINT [PK_DEALER_SOURCE_DATA_VALUES] PRIMARY KEY CLUSTERED
(
[DEALER_SOURCE_DATA_ID],
[FIELD_CODE]
) WITH FILLFACTOR = 90 ON [PRIMARY]
GO

This table has approx 25 million rows. I would have added indexes to
Dealer_Source_Data_Id and Field Code seperately. This table holds the
data ID which can be related back to a row in another table for the
year, month and Dealer ID. It also holds all lines and the value
against those lines. Quite often queries would be needed for everything
by the Dealer_Source_Data_Id hence the first index I would add. Also,
we want to sum (for example) all of a specific field_code, hence the
other index I would add.

We would normally have approx 4,000 unique Field_Codes per
Dealer_Source_Data_Id.

The PK is correct, but not good for querying what we need to as it
would only be used when we specify both Dealer_Source_Data_Id and
Field_Code in order to return a value. Correct ?

Not really. A query only by DEALER_SOURCE_DATA_ID would use the
clustered index. Unless you do queries like:

SELECT COUNT(*), DEALER_SOURCE_DATA_ID
FROM tbl
GROUP BY DEALER_SOURCE_DATA_ID

there is not reason to add an NC index on DEALER_SOURCE_DATA_ID.
But you are correct that the clustred index is not good for queries
on a certain FIELD_CODE.
--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pro...ads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinf...ons/books.mspx

Jun 13 '06 #6

DickChristoph

> Is there any good reference explaining in Layman's terms why you need

basic (or advanced) indexing ?
I assume you need to explain this to the people who spend the money.

Do you know how a binary search works? I usually explain the benefits of
indexing by saying. Suppose you are searching the phonebook. You aren't
going to search every entry on every page until you find the name you are
looking for. (full table scan).

You could split the book in half and determine if the name you are looking
for is in the left or the right half. Then keep doing that until you get to
the right page. You will have at worst Log base 2 of N comparisons. N being
the total number of entries. You can search 1,000,000 entries with 20
comparisons versus an average of 500,000 without an index.

Or I just say its like those little tabs in some dictionaries. (which it
isn't really) but it gets across the point that you are making less work for
SQL Server to do.

--
-Dick Christoph
"Ryan" <ry********@hotmail.com> wrote in message
news:11**********************@f6g2000cwb.googlegro ups.com...I have a bit of a problem with regards an indexing strategy. Well,
basically there is no indexing strategy on a set of data I have at
work. Now, I didn't create the design as I would have allowed for this.

OK, so there is primary key (clustered) indexes (mainly composite
keys), but no other indexes on the tables. As you would expect, the
performance leaves a lot to be desired. A hell of a lot. We have
several million rows in a lot of the tables. None of the queries seem
to be overly complex so we can work through the applications at a later
stage.

We have multiple copies (one per client per country) of the same
structure (I may have considered combining these to allow better
performance). One specific SP that I need to run takes 40+ hours
without indexes and 5 hours with some (130) basic indexes to get us
started on a better design. These 130 indexes are the minimum I suspect
we need and from there, we can start to work out which ones we need.

Now the test database (our worst performer) doubles in size to 20Gb,
but the performance is much much better (as expected). The original
thinking behind the design was for storage concerns (server space
recently upgraded) and for performance with bulk inserts.

We have a lot of bulk inserts, but I suspect that these are not too
bad, and the time taken for indexing these is negligable due to the
performance gains. I strongly suspect that we have a considerable
amount of table scans going on, but the problem is that people here
don't understand indexing (yet) or have the time (probably because it's
all taken up using the poorly designed system). That's a whole seperate
issue for me to address.

So, finally getting round to my questions...

Is there any good reference explaining in Layman's terms why you need
basic (or advanced) indexing ? Any links would be appreciated. I need
this to help explain to colleagues why a disk space increase and
indexing will be far better than spending thousands on a new box and
doing the same (a common problem I suspect).

How can I accurately estimate the amount of time taken to update an
index once data is bulk inserted. I don't want to re-index from scratch
as this may take too long. Indexing my database first time round takes
about 1 hour 30 minutes.

It's all part of an ongoing bit of digging into the system and re-doing
it to make it work properly. I'm sure most of you will have been there
at some point or another.

Thanks
Ryan

Jun 14 '06 #7

Mike C#

Use the 80/20 rule to start. Usually 20% of your queries will account for
80% of your interactions with the database. Identify the 20% that are used
the most often and create indexes that optimize them. You can grab a query
plan (I prefer the graphical query plans in QA) for each query to identify
the bottlenecks and where indexes would be most effective.

"Ryan" <ry********@hotmail.com> wrote in message
news:11**********************@f6g2000cwb.googlegro ups.com...

I have a bit of a problem with regards an indexing strategy. Well,
basically there is no indexing strategy on a set of data I have at
work. Now, I didn't create the design as I would have allowed for this.

OK, so there is primary key (clustered) indexes (mainly composite
keys), but no other indexes on the tables. As you would expect, the
performance leaves a lot to be desired. A hell of a lot. We have
several million rows in a lot of the tables. None of the queries seem
to be overly complex so we can work through the applications at a later
stage.

We have multiple copies (one per client per country) of the same
structure (I may have considered combining these to allow better
performance). One specific SP that I need to run takes 40+ hours
without indexes and 5 hours with some (130) basic indexes to get us
started on a better design. These 130 indexes are the minimum I suspect
we need and from there, we can start to work out which ones we need.

Now the test database (our worst performer) doubles in size to 20Gb,
but the performance is much much better (as expected). The original
thinking behind the design was for storage concerns (server space
recently upgraded) and for performance with bulk inserts.

We have a lot of bulk inserts, but I suspect that these are not too
bad, and the time taken for indexing these is negligable due to the
performance gains. I strongly suspect that we have a considerable
amount of table scans going on, but the problem is that people here
don't understand indexing (yet) or have the time (probably because it's
all taken up using the poorly designed system). That's a whole seperate
issue for me to address.

So, finally getting round to my questions...

Is there any good reference explaining in Layman's terms why you need
basic (or advanced) indexing ? Any links would be appreciated. I need
this to help explain to colleagues why a disk space increase and
indexing will be far better than spending thousands on a new box and
doing the same (a common problem I suspect).

How can I accurately estimate the amount of time taken to update an
index once data is bulk inserted. I don't want to re-index from scratch
as this may take too long. Indexing my database first time round takes
about 1 hour 30 minutes.

It's all part of an ongoing bit of digging into the system and re-doing
it to make it work properly. I'm sure most of you will have been there
at some point or another.

Thanks
Ryan

Jun 14 '06 #8

Similar topics