index bloat? - Microsoft SQL Server

sql_server_2000_user

Hi,

I have a table with about 305 million rows, and a composite primary key
that consists of an ascending int and an ascending varchar(18), which
is typically of length 13. Even if all the keys used the full 18
characters of the varchar, it seems to me each key should be 22 bytes,
so the index should be roughly 6.4GB. However, the size of the index as
shown in EM is about 24GB, and this is slowing everything down
considerably. Does anyone else think this index size is a little
excessive, or know why it should be so large?

Thanks,
Seth

Aug 4 '05 #1

Subscribe Reply

4884

Erland Sommarskog

sql_server_2000 _user (se*********@co mcast.net) writes:

I have a table with about 305 million rows, and a composite primary key
that consists of an ascending int and an ascending varchar(18), which
is typically of length 13. Even if all the keys used the full 18
characters of the varchar, it seems to me each key should be 22 bytes,
so the index should be roughly 6.4GB. However, the size of the index as
shown in EM is about 24GB, and this is slowing everything down
considerably. Does anyone else think this index size is a little
excessive, or know why it should be so large?

Is that a clustered index or a non-clustered iodex?

A clustered index has the data pages in the leafs of the index node, so
size of index is basically size of data.
--
Erland Sommarskog, SQL Server MVP, es****@sommarsk og.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinf...2000/books.asp

Aug 4 '05 #2

Tony Sebion

Just a guess, but is your primary key a clustered index? The clustered
index represents the actual contents of each row in your database, so it
would make sense that it would be larger than only the indexed fields.

If this is the case, you might want to change that index to not be
clustered and choose a more appropriate field to make a clustered index.

Thanks,
Tony

"sql_server_200 0_user" <se*********@co mcast.net> wrote in message
news:11******** **************@ g49g2000cwa.goo glegroups.com:

Hi,

I have a table with about 305 million rows, and a composite primary key
that consists of an ascending int and an ascending varchar(18), which
is typically of length 13. Even if all the keys used the full 18
characters of the varchar, it seems to me each key should be 22 bytes,
so the index should be roughly 6.4GB. However, the size of the index as
shown in EM is about 24GB, and this is slowing everything down
considerably. Does anyone else think this index size is a little
excessive, or know why it should be so large?

Thanks,
Seth

Aug 4 '05 #3

sql_server_2000_user

Yes! It is intentionally a clustered index. But if the db includes the
row data pages when it reports the size of a clustered index, why is
the size of the index not just equal to the size of the table (105GB?)

Thanks,
Seth

Aug 4 '05 #4

Tony Sebion

I think I would take a look at the FILL FACTOR of the index (the % of
the index that is created empty), and possibly do a DBCC SHOWCONTIG to
show the current fragmentation and DBCC REINDEX on the table during a
time of little or no activity if I thought that the tables contents
might be fragmented.

You can look up any of the all-caps terms in Books Online if you want
more info about the commands.

Good luck,
Tony Sebion

"sql_server_200 0_user" <se*********@co mcast.net> wrote in message
news:11******** **************@ g47g2000cwa.goo glegroups.com:

Yes! It is intentionally a clustered index. But if the db includes the
row data pages when it reports the size of a clustered index, why is
the size of the index not just equal to the size of the table (105GB?)

Thanks,
Seth

Aug 4 '05 #5

sql_server_2000_user

thanks for the tips. the fill factor is zero on the index; i'll
definitely look for fragmentation. what i don't understand is, the
query was fast yesterday (4 secs), and today it's slow (10 mins), and
the only change i made to the table was that i added an index on an
unrelated field. (the new index is 12GB)

Aug 4 '05 #6

Tony Sebion

Now that you mention it, you will want to make sure you REINDEX all the
other indexes for that table after you're done reindexing the clustered
one. I'd start there and see what things look like when that is
complete.

Tony

"sql_server_200 0_user" <se*********@co mcast.net> wrote in message
news:11******** **************@ g44g2000cwa.goo glegroups.com:

thanks for the tips. the fill factor is zero on the index; i'll
definitely look for fragmentation. what i don't understand is, the
query was fast yesterday (4 secs), and today it's slow (10 mins), and
the only change i made to the table was that i added an index on an
unrelated field. (the new index is 12GB)

Aug 4 '05 #7

Erland Sommarskog

sql_server_2000 _user (se*********@co mcast.net) writes:

Yes! It is intentionally a clustered index. But if the db includes the
row data pages when it reports the size of a clustered index, why is
the size of the index not just equal to the size of the table (105GB?)
From where did you get that? Enterprise Manager? EM is not known to report
sizes very well. I tried to have a look at it, but was not able to dig
out any sizes at all from EM. (Yeah, that's right, I don't use EM that
often.)

Anyway, your real problem appears to be:
what i don't understand is, the query was fast yesterday (4 secs), and
today it's slow (10 mins), and the only change i made to the table was
that i added an index on an unrelated field. (the new index is 12GB)

It appears that some how this cause a shake-up for the query plan. When
you create a new index, statistics are updated with fullscan. Since the PK
is the clustered index, the PK columns are included in the non-clustered
index as well, as the index keys for the clustered index work as row
locator. I don't know if this causes statistics on the PK to be updated
as well. It could also be that SQL Server makes an incorrect estimate
and thinks that the new index is good for the query.

It would help if you posted the query, the CREATE TABLE statement for
the table, as well as the CREATE INDEX statements.
--
Erland Sommarskog, SQL Server MVP, es****@sommarsk og.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinf...2000/books.asp

Aug 4 '05 #8

sql_server_2000_user

Wow, you guys are great. Yeah, I know EM returns wack object sizes, but
I thought they were kind of close; maybe not. Anyway, here's a test I
find interesting:

query 1:

select * from loan_history where time_period=196 and exloan_id in (
select top 3 exloan_id from loan where deal_no='ML4W1' )

query 2:

select * from loan_history where time_period=196 and exloan_id in
('CTSMYX0010515 778', 'CTSMYX00105257 10', 'CTSMYX00105274 75')

Query 1 takes a long time - i don't know how long because i cancel it
after a few minutes. Query 2 is <1 sec. the odd thing is, the subquery
of query 1 is also <1 sec! ???

The query that used to be fast and is now slow, requiring a huge amount
of reads, is simply:

select * from loan
inner join loan_history lh
on loan.exloan_id = lh.exloan_id
where loan.deal_no='M L4W1'
and time_period='19 6'

Here's the output of DBCC SHOWCONTIG on loan_history (305mm rows):

DBCC SHOWCONTIG scanning 'loan_history' table...
Table: 'loan_history' (949578421); index ID: 1, database ID: 7
TABLE level scan performed.
- Pages Scanned........ ............... .........: 7497085
- Extents Scanned........ ............... .......: 941533
- Extent Switches....... ............... ........: 1489125
- Avg. Pages per Extent......... ............... : 8.0
- Scan Density [Best Count:Actual Count].......: 62.93%
[937136:1489126]
- Logical Scan Fragmentation ............... ...: 93.40%
- Extent Scan Fragmentation ............... ....: 2.39%
- Avg. Bytes Free per Page........... ..........: 535.3
- Avg. Page Density (full)......... ............: 93.39%
DBCC execution completed. If DBCC printed error messages, contact your
system administrator.
And for loan (9.8mm rows):

DBCC SHOWCONTIG scanning 'loan' table...
Table: 'loan' (1413580074); index ID: 1, database ID: 7
TABLE level scan performed.
- Pages Scanned........ ............... .........: 462476
- Extents Scanned........ ............... .......: 58046
- Extent Switches....... ............... ........: 77279
- Avg. Pages per Extent......... ............... : 8.0
- Scan Density [Best Count:Actual Count].......: 74.81% [57810:77280]
- Logical Scan Fragmentation ............... ...: 96.45%
- Extent Scan Fragmentation ............... ....: 0.54%
- Avg. Bytes Free per Page........... ..........: 329.5
- Avg. Page Density (full)......... ............: 95.93%
DBCC execution completed. If DBCC printed error messages, contact your
system administrator.

Sorry for the long post, but here's the create table statement for
loan:

CREATE TABLE [loan] (
[SERIES_NO] [varchar] (12) COLLATE SQL_Latin1_Gene ral_CP1_CI_AS NOT
NULL ,
[POOL_ID] [char] (3) COLLATE SQL_Latin1_Gene ral_CP1_CI_AS NOT NULL ,
[DEAL_NO] [char] (5) COLLATE SQL_Latin1_Gene ral_CP1_CI_AS NOT NULL ,
[GROUP_NO] [char] (3) COLLATE SQL_Latin1_Gene ral_CP1_CI_AS NOT NULL ,
[LOAN_ID] [char] (6) COLLATE SQL_Latin1_Gene ral_CP1_CI_AS NOT NULL ,
[EXLOAN_ID] [varchar] (18) COLLATE SQL_Latin1_Gene ral_CP1_CI_AS NOT
NULL ,
[LOAN_NO] [varchar] (12) COLLATE SQL_Latin1_Gene ral_CP1_CI_AS NOT NULL
,
[ADD_DATE] [int] NULL ,
[PROP_ZIP] [char] (5) COLLATE SQL_Latin1_Gene ral_CP1_CI_AS NULL ,
[STATE] [char] (2) COLLATE SQL_Latin1_Gene ral_CP1_CI_AS NULL ,
[PROP_TYPE] [char] (1) COLLATE SQL_Latin1_Gene ral_CP1_CI_AS NULL ,
[UNITS_NO] [int] NULL ,
[OCCUPANCY] [char] (1) COLLATE SQL_Latin1_Gene ral_CP1_CI_AS NULL ,
[ORIG_DATE] [int] NULL ,
[MATURITY] [int] NULL ,
[FIRST_PMT] [int] NULL ,
[ORIG_AMT] [money] NULL ,
[CLOSE_BAL] [money] NULL ,
[CLOSE_INT] [float] NULL ,
[SALE_PRICE] [money] NULL ,
[APP_VALUE] [money] NULL ,
[PROD_TYPE] [char] (2) COLLATE SQL_Latin1_Gene ral_CP1_CI_AS NULL ,
[TERM] [int] NULL ,
[INIT_RATE] [float] NULL ,
[UNDER_RAT1] [float] NULL ,
[UNDER_RAT2] [float] NULL ,
[LOAN_TYPE] [char] (1) COLLATE SQL_Latin1_Gene ral_CP1_CI_AS NULL ,
[PURPOSE] [char] (1) COLLATE SQL_Latin1_Gene ral_CP1_CI_AS NULL ,
[PMT_FREQ] [char] (1) COLLATE SQL_Latin1_Gene ral_CP1_CI_AS NULL ,
[LOAN_SRC] [char] (1) COLLATE SQL_Latin1_Gene ral_CP1_CI_AS NULL ,
[BUYDOWN] [char] (1) COLLATE SQL_Latin1_Gene ral_CP1_CI_AS NULL ,
[DOCUMENT] [char] (1) COLLATE SQL_Latin1_Gene ral_CP1_CI_AS NULL ,
[PMI_CODE] [char] (3) COLLATE SQL_Latin1_Gene ral_CP1_CI_AS NULL ,
[CONVERT] [char] (1) COLLATE SQL_Latin1_Gene ral_CP1_CI_AS NULL ,
[POOL_INS] [bit] NULL ,
[RECOURSE] [char] (1) COLLATE SQL_Latin1_Gene ral_CP1_CI_AS NULL ,
[LTV] [float] NULL ,
[SERV_FEE_R] [float] NULL ,
[NEGAM] [bit] NULL ,
[NEG_LIMIT] [float] NULL ,
[INDEX_ID] [char] (2) COLLATE SQL_Latin1_Gene ral_CP1_CI_AS NULL ,
[MARGIN] [float] NULL ,
[P_RATE_CAP] [float] NULL ,
[P_RATE_FLO] [float] NULL ,
[P_PAY_CAP] [float] NULL ,
[P_PAY_FLO] [float] NULL ,
[L_RATE_CAP] [float] NULL ,
[L_RATE_FLO] [float] NULL ,
[RATE_RESET] [int] NULL ,
[PAY_RESET] [int] NULL ,
[FIRST_RATE] [int] NULL ,
[FIRST_PAY] [int] NULL ,
[AMORT_TERM] [int] NULL ,
[DOC_RAW] [varchar] (5) COLLATE SQL_Latin1_Gene ral_CP1_CI_AS NULL ,
[FICO] [int] NULL ,
[LIEN] [char] (1) COLLATE SQL_Latin1_Gene ral_CP1_CI_AS NULL ,
[GRADE_RAW] [varchar] (5) COLLATE SQL_Latin1_Gene ral_CP1_CI_AS NULL ,
[GRADE_MIC] [varchar] (5) COLLATE SQL_Latin1_Gene ral_CP1_CI_AS NULL ,
[PP_PEN] [bit] NULL ,
[PP_TERM] [int] NULL ,
[F_RATE_CAP] [float] NULL ,
[PMI_LEVEL] [float] NULL ,
[PLEDGE_AMT] [money] NULL ,
[EFF_LTV] [float] NULL ,
[FIRST_LTV] [float] NULL ,
[SECOND_LTV] [float] NULL ,
[COMB_LTV] [float] NULL ,
[SERVICER] [varchar] (30) COLLATE SQL_Latin1_Gene ral_CP1_CI_AS NULL ,
[ORIGINATOR] [varchar] (30) COLLATE SQL_Latin1_Gene ral_CP1_CI_AS NULL
,
[SILENT_SECOND] [bit] NOT NULL ,
CONSTRAINT [PK_loan] PRIMARY KEY CLUSTERED
(
[EXLOAN_ID]
) ON [PRIMARY]
) ON [PRIMARY]
GO

And for loan_history:

CREATE TABLE [loan_history] (
[TIME_PERIOD] [int] NOT NULL ,
[POOL_ID] [char] (3) COLLATE SQL_Latin1_Gene ral_CP1_CI_AS NOT NULL ,
[DEAL_NO] [char] (5) COLLATE SQL_Latin1_Gene ral_CP1_CI_AS NOT NULL ,
[GROUP_NO] [char] (3) COLLATE SQL_Latin1_Gene ral_CP1_CI_AS NOT NULL ,
[SERVICER] [char] (4) COLLATE SQL_Latin1_Gene ral_CP1_CI_AS NULL ,
[LOAN_ID] [char] (6) COLLATE SQL_Latin1_Gene ral_CP1_CI_AS NOT NULL ,
[EXLOAN_ID] [varchar] (18) COLLATE SQL_Latin1_Gene ral_CP1_CI_AS NOT
NULL ,
[LAST_INT_P] [int] NULL ,
[BALANCE] [money] NULL ,
[INT_RATE] [float] NULL ,
[TOTPMT_DUE] [money] NULL ,
[SCH_PRINC] [money] NULL ,
[SCH_MNTH_P] [money] NULL ,
[MBA_STAT] [char] (1) COLLATE SQL_Latin1_Gene ral_CP1_CI_AS NULL ,
[OTS_STAT] [char] (1) COLLATE SQL_Latin1_Gene ral_CP1_CI_AS NULL ,
[DELIQ_HIST] [char] (12) COLLATE SQL_Latin1_Gene ral_CP1_CI_AS NULL ,
[EXCEPTION] [char] (1) COLLATE SQL_Latin1_Gene ral_CP1_CI_AS NULL ,
[FC_START_D] [int] NULL ,
[FC_END_D] [int] NULL ,
[FC_END_TYP] [char] (1) COLLATE SQL_Latin1_Gene ral_CP1_CI_AS NULL ,
[PAYOFF_D] [int] NULL ,
[PAYOFF_R] [char] (1) COLLATE SQL_Latin1_Gene ral_CP1_CI_AS NULL ,
[REO_DATE] [int] NULL ,
[INV_BAL] [money] NULL ,
[NEXT_INT_R] [float] NULL ,
[LOSS_AMT] [money] NULL ,
[INET_RATE] [float] NULL ,
[YYYYMM] AS ((floor(([TIME_PERIOD] / 12)) + 1989) * 100 +
[TIME_PERIOD] % 12 + 1) ,
CONSTRAINT [PK_loan_history] PRIMARY KEY CLUSTERED
(
[TIME_PERIOD],
[EXLOAN_ID]
) ON [PRIMARY]
) ON [PRIMARY]
GO
Thanks for all the help, I really appreciate it.
Seth

Aug 4 '05 #9

Erland Sommarskog

sql_server_2000 _user (se*********@co mcast.net) writes:

query 1:

select * from loan_history where time_period=196 and exloan_id in (
select top 3 exloan_id from loan where deal_no='ML4W1' )

query 2:

select * from loan_history where time_period=196 and exloan_id in
('CTSMYX0010515 778', 'CTSMYX00105257 10', 'CTSMYX00105274 75')

Query 1 takes a long time - i don't know how long because i cancel it
after a few minutes. Query 2 is <1 sec. the odd thing is, the subquery
of query 1 is also <1 sec! ???
Well, it's not the case that SQL Server first runs the subquery and given
that result optimizes the outer query. It optimizes everything in one
go.

Then again, it's difficult to say why it would fail here, since it knows
that it will get (at most) three rows from loan, and thus only have to
read three rows from loan_history.

You forgot to include indexes for the tables, but I assume that here is
a non-clustered index on loan.deal_no?

What query plan do you have for the slow query here? Run the query
embedded in SET SHOWPLAN_TEXT ON /OFF. (In separate batches.)
The query that used to be fast and is now slow, requiring a huge amount
of reads, is simply:

select * from loan
inner join loan_history lh
on loan.exloan_id = lh.exloan_id
where loan.deal_no='M L4W1'
and time_period='19 6'
For this query it would probably be better if the index on
loan.deal_no was clustered, possibly then it should be (deal_no, exloan_id).
But this could have consequences for other queries.

Again, it would be interesting to see the query plan.

By the way, using SELECT * in production code is not good practice.
Better to list all columns you reallyneed.
Here's the output of DBCC SHOWCONTIG on loan_history (305mm rows):

Thanks. The tables appears to be in decent shape. Not perfect, but
certainly not alarming.

--
Erland Sommarskog, SQL Server MVP, es****@sommarsk og.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinf...2000/books.asp

Aug 4 '05 #10

Similar topics

1574

Fulltext index bloat!

by: steve | last post by:

I accept that fulltext index would be bloated due to all the atomic indexing it has to do. BUT, looking at my mysql db directory, for each table, there is a single index file consisting of all the indecis - regular and fulltext put together. The result is that to access any simple index, the whole monsterous index file has to be pulled...

MySQL Database

3989

Which causes more code-bloat: inline virtual or template class?

by: RainBow | last post by:

Greetings!! I introduced the so-called "thin-template" pattern for controlling the code bloat caused due to template usage. However, one of the functions in the template happens to be virtual as well. To support thin-template, I need to make virtual function as inline. Now, I know that compiler would generate an out-of-line copy when it

C / C++

1331

tsearch2 and gist index bloat

by: George Essig | last post by:

I have installed tsearch2 and have noticed that the gist index used to do searches grows and grows as I update rows, delete rows, or run VACUUM FULL ANALYZE. Below are some details: PostgreSQL 7.4RC1 Red Hat 9 Table "public.series" Column | Type | Modifiers...

PostgreSQL Database

2703

Another database bloat question.

by: Danny J. Lesandrini | last post by:

I know this has been discussed before, as I've seen the Google posts, but they all leave me with an unanswered question: Does my DAO code executed in the front end cause the back end to bloat? (May also substitute UPDATE and/or DELETE queries for DAO code.) I was just brought on to a project with Access 97 where the all data is kept on...

Microsoft Access / VBA

4060

Yet another Backend Bloat Discussion

by: BillCo | last post by:

I'm using a backend that's been around for years before I joined the company. It kind of grew exponentially and has some design problems including large unused tables some temporary tables. It compacts to 150Mb, but after a day it's up to 300Mb and it gros steadily from there. I need to: (a) reduce the size (b) reduce the bloat factor ...

Microsoft Access / VBA

6699

Database bloat - general discussion...

by: MLH | last post by:

I've read a number of posts over the years that dealt with the issue of database bloat. I'm wondering if anyone has determined exactly what information comprises the bloat when it does occur. Has anyone ever looked into that and arrived at a conclusion? Part-2 of the question is "Under what circumstances does MS Access database bloat occur...

Microsoft Access / VBA

29190

Can't get around "IndexError: list index out of range"

by: erikwickstrom | last post by:

Hi all, I'm sorry about the newbie question, but I've been searching all afternoon and can't find the answer! I'm trying to get this bit of code to work without triggering the IndexError. import shutil, os, sys

Python

5552

Why would you create an index with "ignore nulls" set to yes?

by: shira | last post by:

I have done a fair bit of searching, but haven't yet been able to find an explanation as to why one would set "ignore nulls" to "yes" when creating an index. I understand what it does (I think), but I'm looking to understand what scenario might prompt either setting (yes or no). Any clarity you can provide is much appreciated! Thanks kindly.

Microsoft Access / VBA

1747

where will I notice negative side effects of bloat?

by: mirandacascade | last post by:

Question toward the bottom of this post....background information immediately below. Access 97 SQL Server 2000 Please note: although the subject line uses the word 'bloat', this post is NOT a "what can I do to prevent bloat?" inquiry. When I searched the postings in this group for information about bloat, I believe I gained a...

Microsoft Access / VBA

7479

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main...

General

7439

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For...

Windows Server

5987

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then...

Career Advice

5343

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes...

Microsoft Access / VBA

4962

Couldn’t get equations in html when convert word .docx file to html file in C#.

by: conductexam | last post by:

I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert...

C# / C Sharp

3450

Windows Forms - .Net 8.0

by: adsilva | last post by:

A Windows Forms form does not have the event Unload, like VB6. What one acts like?

Visual Basic .NET

1901

transfer the data from one system to another through ip address

by: 6302768590 | last post by:

Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system

C# / C Sharp

1028

How to add payments to a PHP MySQL app.

by: muto222 | last post by:

How can i add a mobile payment intergratation into php mysql website.

PHP

722

Comprehensive Guide to Website Development in Toronto: Expert Insights from BSMN Consultancy

by: bsmnconsultancy | last post by:

In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating...

General