473,556 Members | 2,475 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

CREATE INDEX on large table

SQL Server 7/2000: We have reasonably large tables (3,000,000 rows)
that we need to add some indexes for. In a test, it took over 12 hours
to CREATE a new INDEX against this table. One of us suggested that we
create a temp table with the new index and copy the data from the old
table into the new one, then rename it. I understand this took 15
minutes. Why the heck would it be faster to move the data and build
multiple indexes incrementally vs adding an index??

Jul 23 '05 #1
11 27181
An index on a sorted table is quicker as the indexing process does not need
to reorganized it as its creating the index.

"dfurtney" <df******@hotma il.com> wrote in message
news:11******** **************@ z14g2000cwz.goo glegroups.com.. .
SQL Server 7/2000: We have reasonably large tables (3,000,000 rows)
that we need to add some indexes for. In a test, it took over 12 hours
to CREATE a new INDEX against this table. One of us suggested that we
create a temp table with the new index and copy the data from the old
table into the new one, then rename it. I understand this took 15
minutes. Why the heck would it be faster to move the data and build
multiple indexes incrementally vs adding an index??

Jul 23 '05 #2
Hi

This seems unlikely.
You did not mention if the table is a heap (i.e. there is no clustered
index).
You probably ran into a case where the data in the old table was so out
of order that building the additional index was constatnly splitting
pages.

Jul 23 '05 #3
dfurtney (df******@hotma il.com) writes:
SQL Server 7/2000: We have reasonably large tables (3,000,000 rows)
that we need to add some indexes for. In a test, it took over 12 hours
to CREATE a new INDEX against this table. One of us suggested that we
create a temp table with the new index and copy the data from the old
table into the new one, then rename it. I understand this took 15
minutes. Why the heck would it be faster to move the data and build
multiple indexes incrementally vs adding an index??


12 hours to create an index for three million rows sounds abnormal.

Of course, if the table did not have a clustered index, but already had
several non-clustered index, and you added a clustered index, then it
will take some time, but still not 12 hours.

One possible reason, is that the CREATE INDEX process was blocked by
another process most of the time.

--
Erland Sommarskog, SQL Server MVP, es****@sommarsk og.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinf...2000/books.asp
Jul 23 '05 #4
The table, vehicle_history , had a clustered primary key and 2
additional indexes. We were adding an additional index on a single
integer column. That column was filled with a single default value of
0 in this case. It was running on a dedicated QA server.

We want to add this index to speed up a query against the new column.
But we found it was taking much longer than we expected to add the
index. The only thing that seemed somewhat unusual is the size of the
table.

Jul 23 '05 #5
dfurtney (df******@hotma il.com) writes:
The table, vehicle_history , had a clustered primary key and 2
additional indexes. We were adding an additional index on a single
integer column. That column was filled with a single default value of
0 in this case. It was running on a dedicated QA server.

We want to add this index to speed up a query against the new column.
But we found it was taking much longer than we expected to add the
index. The only thing that seemed somewhat unusual is the size of the
table.


Does all three million rows have 0 in this column? In that case it would
not be a very good index.

I have no idea whether large amount of duplicate values could be reason
that creating the index so long. I still lean towards that there was some-
thing else, for instance blocking, that was the cause. It simply doesn't
take 12 hours to create a non-clustered index on a three-million row table.
--
Erland Sommarskog, SQL Server MVP, es****@sommarsk og.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinf...2000/books.asp
Jul 23 '05 #6
The query/index would be used by a subset of our customers that utilize
a specific feature of the product. For them, the values would be
non-zero, of course, and the resulting index quite selective. However,
we were going to add the index for all customers since we generally
don't know what functionaly they will be utilizing. This dataset was
from a customer not using that function.

I thought it was awfully long and one of my co-workers was going to do
some testing. What order of time would you expect?

Jul 23 '05 #7
dfurtney (df******@hotma il.com) writes:
The query/index would be used by a subset of our customers that utilize
a specific feature of the product. For them, the values would be
non-zero, of course, and the resulting index quite selective. However,
we were going to add the index for all customers since we generally
don't know what functionaly they will be utilizing. This dataset was
from a customer not using that function.

I thought it was awfully long and one of my co-workers was going to do
some testing. What order of time would you expect?


The below script which emulates the situation you have described ran
in eight minutes on my workstation, a 2.8 GHz HT box with 1 GB of RAM
(but with SQL Server constrained to some 120 MB), running Windows XP SP2.
The particular part of creating a non-clustered index on a column with
non-variant values took 30 seconds. (But then all data was in cache.)

Of course, not only number of rows count, but the size of the rows as
well, since wider the rows, the more pages you get. Then again, for
the sorting phase there are still only three million rows.

It occurred to me that one thing you could have run into is autogrow.
If the database is 300 GB, and you have 10% autogrow and this happens to
set in during the index creation, you're in for a pause. Initializing
30 GB of data does take some time. Not 12 hours though. 20-30 minutes
may be expected.

use master
go
drop database klump
go
create database klump
go
use klump
go
select TOP 3000000 klumpid = identity(int, 1, 1),
slaskcol = 0,
a.* into klump
from Northwind..Orde rs a
cross join Northwind..Orde rs b
cross join Northwind..Orde rs c
go
ALTER TABLE klump ADD CONSTRAINT pk_klump PRIMARY KEY (klumpid)
go
CREATE INDEX orderidix ON klump (OrderID)
CREATE INDEX customerid ON klump (CustomerID)
go
SELECT getdate()
go
CREATE INDEX slaskix ON klump(slaskcol)
go
SELECT getdate()

--
Erland Sommarskog, SQL Server MVP, es****@sommarsk og.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinf...2000/books.asp
Jul 23 '05 #8
Although this happened twice on two machines when analyzing the upgrade
script via profiler -- our latest attempts to isolate what is happening
ended up not reproducing our earlier results. The indexes are building
in 2 minutes when tested in isolation outside the script. The only
operation which we have reliably reproduced as slow is adding a column
with a default value to a large table - which is taking on the order of
20 minutes in the million row range. We do this a number of times. We
don't yet have an explanation for why the script took so long, although
we are running the analysis one more time as I speak.

Sorry to have presented you with a problem that didn't reproduce - but
we were about to make some drastic changes based on the faulty
assumption that building million row indexes was much more expensive
then it really is. I appreciate the help you folks have provided!

Jul 23 '05 #9
dfurtney (df******@hotma il.com) writes:
Although this happened twice on two machines when analyzing the upgrade
script via profiler -- our latest attempts to isolate what is happening
ended up not reproducing our earlier results. The indexes are building
in 2 minutes when tested in isolation outside the script. The only
operation which we have reliably reproduced as slow is adding a column
with a default value to a large table - which is taking on the order of
20 minutes in the million row range. We do this a number of times.
This sounds like a perfectly normal time for such an operation. Since
this column has to be copied into every row, the entire table has to
be rewritten. And unless every page bas bytes to spare for the new column,
you also get rows rearranged, and it is not a simple update in place.

In this case, it can sometimes be better to create a new table and
copy data over. (This in fact what we always do in our update scripts,
although that more has to do with the greater flexibility this
technique offers.)
Sorry to have presented you with a problem that didn't reproduce - but
we were about to make some drastic changes based on the faulty
assumption that building million row indexes was much more expensive
then it really is. I appreciate the help you folks have provided!


Oh, never mind! I think your concern was very valid, and I am glad to
have helped by telling you that it must have been a false alarm.

--
Erland Sommarskog, SQL Server MVP, es****@sommarsk og.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinf...2000/books.asp
Jul 23 '05 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
2804
by: Guy Deprez | last post by:
Hi, i'm having a problem to create indexes. STEP 1 ----------- Connection is OK (you can find the string at the end of the message) Table ("Couleurs") creation is OK STEP 2. Index Creation
8
3241
by: Együd Csaba | last post by:
Hi All, how can I improve the query performance in the following situation: I have a big (4.5+ million rows) table. One query takes approx. 9 sec to finish resulting ~10000 rows. But if I run simultaneously 4 similar queries it takes nearly 5 minutes instead of 4 times 9 seconds or something near of that. here is a sample query: select ...
7
10794
by: Bing Wu | last post by:
Hi Folks, I have a very large table containing 170 million rows of coordinats: CREATE TABLE "DB2ADMIN"."COORDINATE" ( "FID" INTEGER NOT NULL , "AID" INTEGER NOT NULL , "X" REAL NOT NULL , "Y" REAL NOT NULL , "Z" REAL NOT NULL )
4
5864
by: maricel | last post by:
Could someone confirm which tablespace is being used when running ALTER & CREATE INDEX. Is it the tempspace or the tablespace where the table resides? Many thanks, maricel
3
6867
by: Joachim Klassen | last post by:
Hi all, first apologies if this question looks the same as another one I recently posted - its a different thing but for the same szenario:-). We are having performance problems when inserting/deleting rows from a large table. My scenario: Table (lets call it FACT1) with 1000 million rows distributed on 12
17
8468
by: Jeffrey W. Baker | last post by:
Greetings, I have a 23GB data table upon which I am building a primary key of three columns. The data is mounted in a 137GB device and pg_xlog is mounted on a separate 3.5GB device. I have configured 24 checkpoint segments, which I expect gives me a worst-case usage in pg_xlog of 384MB. Unfortunately, during the CREATE INDEX, pg_xlog...
2
1533
by: John Wells | last post by:
Guys, I have a general question about designing databases for large data sets. I was speaking with a colleague about an application we're preparing to build. One of the application's tables will potentially contain 2 million or more names, containing (at least) the fields first_name, last_name, middle_name and prefix. A common lookup...
6
5922
by: Dilip1983 | last post by:
Hi All, I want to delete duplicate records from a large table. There is one index(INDEX_U1) on 4 columns(col1,col2,col3,col4) which is in unusable state. First of all when i tried to rebuild index it showed error as unique key violation. So i want to delete duplicate records for col1,col2,col3,col4 combination. How can i delete the...
1
2590
by: vivek samantray | last post by:
I have a query.When i try to create a index on one of the table the index gets created but when i take the output it stuill shows "TABLE ACCESS FULL" Please see below what i did QUERY ====== SELECT b.DPT_EVNT_ID, b.DPT_ENTITY_ID, a.EDD_DT, b.DPT_DPT_ID, b.OU_ID,
0
7622
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main...
0
7825
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. ...
0
8060
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that...
0
7904
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the...
0
6176
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then...
1
5452
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes...
0
5171
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert...
0
3575
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
0
865
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.