473,379 Members | 1,533 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,379 software developers and data experts.

Indexing strategy

I have a bit of a problem with regards an indexing strategy. Well,
basically there is no indexing strategy on a set of data I have at
work. Now, I didn't create the design as I would have allowed for this.

OK, so there is primary key (clustered) indexes (mainly composite
keys), but no other indexes on the tables. As you would expect, the
performance leaves a lot to be desired. A hell of a lot. We have
several million rows in a lot of the tables. None of the queries seem
to be overly complex so we can work through the applications at a later
stage.

We have multiple copies (one per client per country) of the same
structure (I may have considered combining these to allow better
performance). One specific SP that I need to run takes 40+ hours
without indexes and 5 hours with some (130) basic indexes to get us
started on a better design. These 130 indexes are the minimum I suspect
we need and from there, we can start to work out which ones we need.

Now the test database (our worst performer) doubles in size to 20Gb,
but the performance is much much better (as expected). The original
thinking behind the design was for storage concerns (server space
recently upgraded) and for performance with bulk inserts.

We have a lot of bulk inserts, but I suspect that these are not too
bad, and the time taken for indexing these is negligable due to the
performance gains. I strongly suspect that we have a considerable
amount of table scans going on, but the problem is that people here
don't understand indexing (yet) or have the time (probably because it's
all taken up using the poorly designed system). That's a whole seperate
issue for me to address.

So, finally getting round to my questions...

Is there any good reference explaining in Layman's terms why you need
basic (or advanced) indexing ? Any links would be appreciated. I need
this to help explain to colleagues why a disk space increase and
indexing will be far better than spending thousands on a new box and
doing the same (a common problem I suspect).

How can I accurately estimate the amount of time taken to update an
index once data is bulk inserted. I don't want to re-index from scratch
as this may take too long. Indexing my database first time round takes
about 1 hour 30 minutes.

It's all part of an ongoing bit of digging into the system and re-doing
it to make it work properly. I'm sure most of you will have been there
at some point or another.

Thanks
Ryan

Jun 12 '06 #1
7 1799
Stu
Hey Ryan,

I can't think of any layman's resources for indexing strategies, but I
know that Kalen Delaney's book "Inside SQL Server" has some very good
explanations of how things work under the hood (including indexes);
might be useful.

However, I am a bit perplexed as to the need for 130+ indexes being
affected by a single stored procedure; is this because of the redundant
data structure per client? Can you post some sample table structures?
It sounds as if the clustered index is actually a poor candidate given
the volume of data and the amount of time to perform a bulk insert.

Stu
Ryan wrote:
I have a bit of a problem with regards an indexing strategy. Well,
basically there is no indexing strategy on a set of data I have at
work. Now, I didn't create the design as I would have allowed for this.

OK, so there is primary key (clustered) indexes (mainly composite
keys), but no other indexes on the tables. As you would expect, the
performance leaves a lot to be desired. A hell of a lot. We have
several million rows in a lot of the tables. None of the queries seem
to be overly complex so we can work through the applications at a later
stage.

We have multiple copies (one per client per country) of the same
structure (I may have considered combining these to allow better
performance). One specific SP that I need to run takes 40+ hours
without indexes and 5 hours with some (130) basic indexes to get us
started on a better design. These 130 indexes are the minimum I suspect
we need and from there, we can start to work out which ones we need.

Now the test database (our worst performer) doubles in size to 20Gb,
but the performance is much much better (as expected). The original
thinking behind the design was for storage concerns (server space
recently upgraded) and for performance with bulk inserts.

We have a lot of bulk inserts, but I suspect that these are not too
bad, and the time taken for indexing these is negligable due to the
performance gains. I strongly suspect that we have a considerable
amount of table scans going on, but the problem is that people here
don't understand indexing (yet) or have the time (probably because it's
all taken up using the poorly designed system). That's a whole seperate
issue for me to address.

So, finally getting round to my questions...

Is there any good reference explaining in Layman's terms why you need
basic (or advanced) indexing ? Any links would be appreciated. I need
this to help explain to colleagues why a disk space increase and
indexing will be far better than spending thousands on a new box and
doing the same (a common problem I suspect).

How can I accurately estimate the amount of time taken to update an
index once data is bulk inserted. I don't want to re-index from scratch
as this may take too long. Indexing my database first time round takes
about 1 hour 30 minutes.

It's all part of an ongoing bit of digging into the system and re-doing
it to make it work properly. I'm sure most of you will have been there
at some point or another.

Thanks
Ryan


Jun 12 '06 #2
Ryan (ry********@hotmail.com) writes:
Is there any good reference explaining in Layman's terms why you need
basic (or advanced) indexing ? Any links would be appreciated. I need
this to help explain to colleagues why a disk space increase and
indexing will be far better than spending thousands on a new box and
doing the same (a common problem I suspect).
I echo Stu and recommend Kalen's book.

Understanding indexing correctly is absolute essential to get best
performance out of SQL Server (or any other RDBMS for that matter).
How can I accurately estimate the amount of time taken to update an
index once data is bulk inserted. I don't want to re-index from scratch
as this may take too long. Indexing my database first time round takes
about 1 hour 30 minutes.


For 130 indexes? That's not too bad.

In any case, the best way to find out the cost of an index for bulk
insert is to benchmark.
--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pro...ads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinf...ons/books.mspx
Jun 12 '06 #3
It's 130 basic indexes across all of the tables. Some of which are used
in the SP. I examined all of the tables to work out the minimum, then
we can start working on a proper strategy. Example table :

if exists (select * from dbo.sysobjects where id =
object_id(N'[dbo].[DEALER_SOURCE_DATA_VALUES]') and OBJECTPROPERTY(id,
N'IsUserTable') = 1)
drop table [dbo].[DEALER_SOURCE_DATA_VALUES]
GO

CREATE TABLE [dbo].[DEALER_SOURCE_DATA_VALUES] (
[DEALER_SOURCE_DATA_ID] [int] NOT NULL ,
[FIELD_CODE] [varchar] (10) COLLATE SQL_Latin1_General_CP1_CI_AS NOT
NULL ,
[FIELD_VALUE] [numeric](15, 5) NOT NULL
) ON [PRIMARY]
GO

ALTER TABLE [dbo].[DEALER_SOURCE_DATA_VALUES] WITH NOCHECK ADD
CONSTRAINT [PK_DEALER_SOURCE_DATA_VALUES] PRIMARY KEY CLUSTERED
(
[DEALER_SOURCE_DATA_ID],
[FIELD_CODE]
) WITH FILLFACTOR = 90 ON [PRIMARY]
GO

This table has approx 25 million rows. I would have added indexes to
Dealer_Source_Data_Id and Field Code seperately. This table holds the
data ID which can be related back to a row in another table for the
year, month and Dealer ID. It also holds all lines and the value
against those lines. Quite often queries would be needed for everything
by the Dealer_Source_Data_Id hence the first index I would add. Also,
we want to sum (for example) all of a specific field_code, hence the
other index I would add.

We would normally have approx 4,000 unique Field_Codes per
Dealer_Source_Data_Id.

The PK is correct, but not good for querying what we need to as it
would only be used when we specify both Dealer_Source_Data_Id and
Field_Code in order to return a value. Correct ? When I want more, it
would force a table scan if I'm correct. Proper indexing should allow
us to only need to scan 4,000 rows max using one index and 6,250 rows
using the other.
Ryan

Stu wrote:
Hey Ryan,

I can't think of any layman's resources for indexing strategies, but I
know that Kalen Delaney's book "Inside SQL Server" has some very good
explanations of how things work under the hood (including indexes);
might be useful.

However, I am a bit perplexed as to the need for 130+ indexes being
affected by a single stored procedure; is this because of the redundant
data structure per client? Can you post some sample table structures?
It sounds as if the clustered index is actually a poor candidate given
the volume of data and the amount of time to perform a bulk insert.

Stu
Ryan wrote:
I have a bit of a problem with regards an indexing strategy. Well,
basically there is no indexing strategy on a set of data I have at
work. Now, I didn't create the design as I would have allowed for this.

OK, so there is primary key (clustered) indexes (mainly composite
keys), but no other indexes on the tables. As you would expect, the
performance leaves a lot to be desired. A hell of a lot. We have
several million rows in a lot of the tables. None of the queries seem
to be overly complex so we can work through the applications at a later
stage.

We have multiple copies (one per client per country) of the same
structure (I may have considered combining these to allow better
performance). One specific SP that I need to run takes 40+ hours
without indexes and 5 hours with some (130) basic indexes to get us
started on a better design. These 130 indexes are the minimum I suspect
we need and from there, we can start to work out which ones we need.

Now the test database (our worst performer) doubles in size to 20Gb,
but the performance is much much better (as expected). The original
thinking behind the design was for storage concerns (server space
recently upgraded) and for performance with bulk inserts.

We have a lot of bulk inserts, but I suspect that these are not too
bad, and the time taken for indexing these is negligable due to the
performance gains. I strongly suspect that we have a considerable
amount of table scans going on, but the problem is that people here
don't understand indexing (yet) or have the time (probably because it's
all taken up using the poorly designed system). That's a whole seperate
issue for me to address.

So, finally getting round to my questions...

Is there any good reference explaining in Layman's terms why you need
basic (or advanced) indexing ? Any links would be appreciated. I need
this to help explain to colleagues why a disk space increase and
indexing will be far better than spending thousands on a new box and
doing the same (a common problem I suspect).

How can I accurately estimate the amount of time taken to update an
index once data is bulk inserted. I don't want to re-index from scratch
as this may take too long. Indexing my database first time round takes
about 1 hour 30 minutes.

It's all part of an ongoing bit of digging into the system and re-doing
it to make it work properly. I'm sure most of you will have been there
at some point or another.

Thanks
Ryan


Jun 13 '06 #4
A good rule of thumb :
1) index all FK.

If you have time and money :
Analyse all your query, and you will be able to get advise on all the
index you need.

If you dont have time or money :
Start working on long query first, and optimize them. You will get big
result fast.

90% of all performance probleme i have seen in the past are from query.
(query itself or logic in the design) . Not network, not RAM, or CPU,
not disk.
So start there.

Jun 13 '06 #5
Ryan (ry********@hotmail.com) writes:
ALTER TABLE [dbo].[DEALER_SOURCE_DATA_VALUES] WITH NOCHECK ADD
CONSTRAINT [PK_DEALER_SOURCE_DATA_VALUES] PRIMARY KEY CLUSTERED
(
[DEALER_SOURCE_DATA_ID],
[FIELD_CODE]
) WITH FILLFACTOR = 90 ON [PRIMARY]
GO

This table has approx 25 million rows. I would have added indexes to
Dealer_Source_Data_Id and Field Code seperately. This table holds the
data ID which can be related back to a row in another table for the
year, month and Dealer ID. It also holds all lines and the value
against those lines. Quite often queries would be needed for everything
by the Dealer_Source_Data_Id hence the first index I would add. Also,
we want to sum (for example) all of a specific field_code, hence the
other index I would add.

We would normally have approx 4,000 unique Field_Codes per
Dealer_Source_Data_Id.

The PK is correct, but not good for querying what we need to as it
would only be used when we specify both Dealer_Source_Data_Id and
Field_Code in order to return a value. Correct ?


Not really. A query only by DEALER_SOURCE_DATA_ID would use the
clustered index. Unless you do queries like:

SELECT COUNT(*), DEALER_SOURCE_DATA_ID
FROM tbl
GROUP BY DEALER_SOURCE_DATA_ID

there is not reason to add an NC index on DEALER_SOURCE_DATA_ID.
But you are correct that the clustred index is not good for queries
on a certain FIELD_CODE.
--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pro...ads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinf...ons/books.mspx
Jun 13 '06 #6
> Is there any good reference explaining in Layman's terms why you need
basic (or advanced) indexing ?
I assume you need to explain this to the people who spend the money.

Do you know how a binary search works? I usually explain the benefits of
indexing by saying. Suppose you are searching the phonebook. You aren't
going to search every entry on every page until you find the name you are
looking for. (full table scan).

You could split the book in half and determine if the name you are looking
for is in the left or the right half. Then keep doing that until you get to
the right page. You will have at worst Log base 2 of N comparisons. N being
the total number of entries. You can search 1,000,000 entries with 20
comparisons versus an average of 500,000 without an index.

Or I just say its like those little tabs in some dictionaries. (which it
isn't really) but it gets across the point that you are making less work for
SQL Server to do.

--
-Dick Christoph
"Ryan" <ry********@hotmail.com> wrote in message
news:11**********************@f6g2000cwb.googlegro ups.com...I have a bit of a problem with regards an indexing strategy. Well,
basically there is no indexing strategy on a set of data I have at
work. Now, I didn't create the design as I would have allowed for this.

OK, so there is primary key (clustered) indexes (mainly composite
keys), but no other indexes on the tables. As you would expect, the
performance leaves a lot to be desired. A hell of a lot. We have
several million rows in a lot of the tables. None of the queries seem
to be overly complex so we can work through the applications at a later
stage.

We have multiple copies (one per client per country) of the same
structure (I may have considered combining these to allow better
performance). One specific SP that I need to run takes 40+ hours
without indexes and 5 hours with some (130) basic indexes to get us
started on a better design. These 130 indexes are the minimum I suspect
we need and from there, we can start to work out which ones we need.

Now the test database (our worst performer) doubles in size to 20Gb,
but the performance is much much better (as expected). The original
thinking behind the design was for storage concerns (server space
recently upgraded) and for performance with bulk inserts.

We have a lot of bulk inserts, but I suspect that these are not too
bad, and the time taken for indexing these is negligable due to the
performance gains. I strongly suspect that we have a considerable
amount of table scans going on, but the problem is that people here
don't understand indexing (yet) or have the time (probably because it's
all taken up using the poorly designed system). That's a whole seperate
issue for me to address.

So, finally getting round to my questions...

Is there any good reference explaining in Layman's terms why you need
basic (or advanced) indexing ? Any links would be appreciated. I need
this to help explain to colleagues why a disk space increase and
indexing will be far better than spending thousands on a new box and
doing the same (a common problem I suspect).

How can I accurately estimate the amount of time taken to update an
index once data is bulk inserted. I don't want to re-index from scratch
as this may take too long. Indexing my database first time round takes
about 1 hour 30 minutes.

It's all part of an ongoing bit of digging into the system and re-doing
it to make it work properly. I'm sure most of you will have been there
at some point or another.

Thanks
Ryan

Jun 14 '06 #7
Use the 80/20 rule to start. Usually 20% of your queries will account for
80% of your interactions with the database. Identify the 20% that are used
the most often and create indexes that optimize them. You can grab a query
plan (I prefer the graphical query plans in QA) for each query to identify
the bottlenecks and where indexes would be most effective.

"Ryan" <ry********@hotmail.com> wrote in message
news:11**********************@f6g2000cwb.googlegro ups.com...
I have a bit of a problem with regards an indexing strategy. Well,
basically there is no indexing strategy on a set of data I have at
work. Now, I didn't create the design as I would have allowed for this.

OK, so there is primary key (clustered) indexes (mainly composite
keys), but no other indexes on the tables. As you would expect, the
performance leaves a lot to be desired. A hell of a lot. We have
several million rows in a lot of the tables. None of the queries seem
to be overly complex so we can work through the applications at a later
stage.

We have multiple copies (one per client per country) of the same
structure (I may have considered combining these to allow better
performance). One specific SP that I need to run takes 40+ hours
without indexes and 5 hours with some (130) basic indexes to get us
started on a better design. These 130 indexes are the minimum I suspect
we need and from there, we can start to work out which ones we need.

Now the test database (our worst performer) doubles in size to 20Gb,
but the performance is much much better (as expected). The original
thinking behind the design was for storage concerns (server space
recently upgraded) and for performance with bulk inserts.

We have a lot of bulk inserts, but I suspect that these are not too
bad, and the time taken for indexing these is negligable due to the
performance gains. I strongly suspect that we have a considerable
amount of table scans going on, but the problem is that people here
don't understand indexing (yet) or have the time (probably because it's
all taken up using the poorly designed system). That's a whole seperate
issue for me to address.

So, finally getting round to my questions...

Is there any good reference explaining in Layman's terms why you need
basic (or advanced) indexing ? Any links would be appreciated. I need
this to help explain to colleagues why a disk space increase and
indexing will be far better than spending thousands on a new box and
doing the same (a common problem I suspect).

How can I accurately estimate the amount of time taken to update an
index once data is bulk inserted. I don't want to re-index from scratch
as this may take too long. Indexing my database first time round takes
about 1 hour 30 minutes.

It's all part of an ongoing bit of digging into the system and re-doing
it to make it work properly. I'm sure most of you will have been there
at some point or another.

Thanks
Ryan

Jun 14 '06 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

21
by: Hilde Roth | last post by:
This may have been asked before but I can't find it. If I have a rectangular list of lists, say, l = ,,], is there a handy syntax for retrieving the ith item of every sublist? I know about for i...
1
by: Puvendran Selvaratnam | last post by:
Hi, First of all my apologies if you have seen this mail already but I am re-sending as there were some initial problems. This query is related to defining indexes to be unique or not and...
4
by: Claudio Jolowicz | last post by:
I am trying to find a solution to the following design problem (code at the bottom): We are implementing a trader agent that can trade with other traders on an electronical trading platform. To...
4
by: rcamarda | last post by:
I'm using Idera's SQL Diagnostic Manager and its showing me my index is using the File Group "Primary" (which I recently read is bad) and I have 3 index levels with some Data values having in...
108
by: Bryan Olson | last post by:
The Python slice type has one method 'indices', and reportedly: This method takes a single integer argument /length/ and computes information about the extended slice that the slice object would...
9
by: John Sidney-Woollett | last post by:
Hi I'm building a web app where changes to customer orders are logged in the following table, and I worried about the time that it will take to locate records that need further...
6
by: Daniel Santa Cruz | last post by:
Hello all, I've been trying to go over my OO Patterns book, and I decided to try to implement them in Python this time around. I figured this would help me learn the language better. Well,...
5
by: pythoncurious | last post by:
Hi python experts In C++ I can do something like this: class Base { public: void f() { this->f_(); } private: virtual void f_() = 0; };
2
by: =?Utf-8?B?SmVycnkgQw==?= | last post by:
I have a server 2008 IIS 7.0 with indexing service installed. I have created the catalog and have a test page using these posts:...
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.