Delete Many Rows - Microsoft SQL Server

hallpa1

Hi All,
I am designing a purge process for a db that has grown to almost 200GB.
My purge process will remove about 1/3 of the 500 million rows spread
over seven tables. Currently there are about 35 indexes defined on
those seven tables. My question is will there be a performance gain by
dropping those indexes, doing my purge, and re-creating the indexes. I
am afraid that leaving those indexes in place will create a lot of
extra overhead in my delete statements by having to maintain the
indexes. I know that it could take many hours to rebuild the indexes
afterward, but I am planning on doing that anyway. The reason that I
want to know whether I should drop the indexes ahead of time, is I may
not be able to do the entire purge at once and the tables may need to
be accessed between purges. If this occurs, I will need to have those
indexes in place.

So do I drop the indexes before the purge and re-create them later or
do I leave them in place and re-index them afterward?

Thanks In Advance

p.h.

Mar 17 '06 #1

Subscribe Post Reply

2765

Jack Vamvas

How often are queries run on these tables?

--
Jack Vamvas
___________________________________
Receive free SQL tips - www.ciquery.com/sqlserver.htm
<ha*****@yahoo.com> wrote in message
news:11*********************@p10g2000cwp.googlegro ups.com...

Hi All,
I am designing a purge process for a db that has grown to almost 200GB.
My purge process will remove about 1/3 of the 500 million rows spread
over seven tables. Currently there are about 35 indexes defined on
those seven tables. My question is will there be a performance gain by
dropping those indexes, doing my purge, and re-creating the indexes. I
am afraid that leaving those indexes in place will create a lot of
extra overhead in my delete statements by having to maintain the
indexes. I know that it could take many hours to rebuild the indexes
afterward, but I am planning on doing that anyway. The reason that I
want to know whether I should drop the indexes ahead of time, is I may
not be able to do the entire purge at once and the tables may need to
be accessed between purges. If this occurs, I will need to have those
indexes in place.

So do I drop the indexes before the purge and re-create them later or
do I leave them in place and re-index them afterward?

Thanks In Advance

p.h.

Mar 17 '06 #2

hallpa1

I understand that the tables are accessed frequently during the day. I
know that I have to be very sensitive to affecting the response times
of queries against these tables. I am not the DBA that owns this
database, so I do not have access to specific details about the queries
against the tables.

Mar 17 '06 #3

Jack Vamvas

Without knowing the query access during this time it's hard to say, but
another idea
is you can create a Temporary Table (a persistent backup table with all the
date , not a sql temp table) and move the records there. (the ones you want
to keep)
Truncate your parent table. And move the records back to the original table
with the records you want.
Do the tables have clustered indexes?
--
Jack Vamvas
___________________________________
Receive free SQL tips - www.ciquery.com/sqlserver.htm
<ha*****@yahoo.com> wrote in message
news:11********************@p10g2000cwp.googlegrou ps.com...

I understand that the tables are accessed frequently during the day. I
know that I have to be very sensitive to affecting the response times
of queries against these tables. I am not the DBA that owns this
database, so I do not have access to specific details about the queries
against the tables.

Mar 17 '06 #4

hallpa1

Most of the tables do not have a clustered index, one does.

I thought about the temp table option, but the database may get rows
updated while purging the old rows. I would then lose those updates
when I trunc the parent table.

Mar 17 '06 #5

Erland Sommarskog

(ha*****@yahoo.com) writes:

I am designing a purge process for a db that has grown to almost 200GB.
My purge process will remove about 1/3 of the 500 million rows spread
over seven tables. Currently there are about 35 indexes defined on
those seven tables. My question is will there be a performance gain by
dropping those indexes, doing my purge, and re-creating the indexes. I
am afraid that leaving those indexes in place will create a lot of
extra overhead in my delete statements by having to maintain the
indexes. I know that it could take many hours to rebuild the indexes
afterward, but I am planning on doing that anyway. The reason that I
want to know whether I should drop the indexes ahead of time, is I may
not be able to do the entire purge at once and the tables may need to
be accessed between purges. If this occurs, I will need to have those
indexes in place.

So do I drop the indexes before the purge and re-create them later or
do I leave them in place and re-index them afterward?

If you really want to know - benchmark. In any case, you should not
run such a heavy operation in prodction, before testing it on a copy
of the live data. In a test environment, you can try different strategies.

Of course, if there is a requirement that that the database is accessible
while you are doing your purge, you will have to find a low-impact purges
that delete fairly small slices at time, and in this case dropping index
out the question.

Again, this why running this in a test environment is important. If
you can say "I was able to run the purge, including restoring of
indexes that I dropped, in eight hours", then it may be deemed that
it is acceptable to take the database offline.

There is another factor to it. If you drop and recreate the indexes,
you will address fragmentation in the indexes, that your purge inevitably
will cause. Worrying here, though, is that only one table has a a clustered
index. This means that the data pages of the other six tables will
reamin fragmented. For this reason, I would recommend that once your
purge has completed, that you build clustered indexes on these tables.
I would also recommend that you keep them in place, but if your DBA thinks
they are bad, you can just drop them. In such case, drop it before you
recreate the non-clustered indexes. (Else it is a costly operation to
drop clustered indexes.)
--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pro...ads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinf...ons/books.mspx

Mar 17 '06 #6

hallpa1

I have been testing heavily on a full copy of the production data.
Unfortunately the test server that I was given is a much slower machine
than the production box.
In my test environment, doing the purge without the indexes is much
faster than purging with the indexes in place. However, it took over
17 hours to rebuild the indexes afterward.
My estimates suggest that purging without the indexes then rebuilding
them will take about 40 straight hours. If I try to do the purge with
the indexes in place, it will take about 200 hours.
If I can get a window large enough, dropping the indexes, purging, and
rebuilding looks like the way to go. If I cannot get that window, then
I have to do it over multiple weekends leaving the indexes in place.
The key is that the indexes must be in place at the end of the window
so that production processing can resume.
Of course these numbers will be much lower on the production box, but
since I cannot test there, I cannot guess how much lower they will be.

Mar 21 '06 #7

Stu

Rather than doing the delete continuously, you may want to delete the
rows in batches ordered by the clustered index column. It may take you
several hours, but at least it'll be a scripted job rather than
watching it do nothing. And, if something blows up, then if you commit
the transaction periodically, the rollback won't suck as much.

Mar 21 '06 #8

hallpa1

My process involves breaking the process into many smaller chunks.
Purging that chunk and doing a commit, then grab the next chunk. So
while it will run continuously during my available window, there will
be a commit after every chunk, which is a few thousand rows. The issue
is that I would like to be able to finish all of the chunks in one
window then rebuild the indexes.

Mar 21 '06 #9

Erland Sommarskog

ha*****@yahoo.com (ha*****@yahoo.com) writes:

I have been testing heavily on a full copy of the production data.
Unfortunately the test server that I was given is a much slower machine
than the production box.
That is probably a good thing. :-) At least, you will not get estimates
that are overly optimistic.
In my test environment, doing the purge without the indexes is much
faster than purging with the indexes in place. However, it took over
17 hours to rebuild the indexes afterward.
That's indeed a long time.
My estimates suggest that purging without the indexes then rebuilding
them will take about 40 straight hours. If I try to do the purge with
the indexes in place, it will take about 200 hours.

Hm, unless you have some extra non-business days around Easter you can
use, it sounds like a purge bit by bit with the indexes in place.

Of course, if you can get a window from Friday night to Monday morning,
that will suffice, but it will be a little nervous. Then again,
indexes can always be recreated by restoring a backup, but then all
purge job would be lost.

Stu's suggestion to go by clustered index is a good one, but I recall
that only one of your tables had one.

--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pro...ads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinf...ons/books.mspx

Mar 21 '06 #10

Erland Sommarskog

ha*****@yahoo.com (ha*****@yahoo.com) writes:

My process involves breaking the process into many smaller chunks.
Purging that chunk and doing a commit, then grab the next chunk. So
while it will run continuously during my available window, there will
be a commit after every chunk, which is a few thousand rows. The issue
is that I would like to be able to finish all of the chunks in one
window then rebuild the indexes.

A few thousand? That's too small! It depends on how little wide your
tables are, but I would suggest 100000 rows. To small batches can cost
you time, because it takes time to locate the chunks.

While it takes a lot of time, benchmarking is the only way to find out
what is a good value.
--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pro...ads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinf...ons/books.mspx

Mar 21 '06 #11

Greg D. Moore $Strider$

<ha*****@yahoo.com> wrote in message
news:11*********************@j33g2000cwa.googlegro ups.com...

I have been testing heavily on a full copy of the production data.
Unfortunately the test server that I was given is a much slower machine
than the production box.
In my test environment, doing the purge without the indexes is much
faster than purging with the indexes in place. However, it took over
17 hours to rebuild the indexes afterward.
One suggestion if you can. Move the indexes to separate spindles. (but not
the clustered index).

Also, Enterprise Server can parallize a lot of index work so may be faster.

(one job I run quarterly runs 3-4x faster on one box and it appears the
deciding factor is probably Enterprise vs. Standard.)

My estimates suggest that purging without the indexes then rebuilding
them will take about 40 straight hours. If I try to do the purge with
the indexes in place, it will take about 200 hours.
If I can get a window large enough, dropping the indexes, purging, and
rebuilding looks like the way to go. If I cannot get that window, then
I have to do it over multiple weekends leaving the indexes in place.
The key is that the indexes must be in place at the end of the window
so that production processing can resume.
Of course these numbers will be much lower on the production box, but
since I cannot test there, I cannot guess how much lower they will be.

Mar 22 '06 #12

Doug

boy, i sure am not a fan of clustered indexes, and for sure where you
are deleting chunks of data throughout, i do NOT understand any drive
to move to clustered indexes. But I'm sure listening and wanting to
learn!

The basic issues are understood. The sheer scale of this project leads
you to some interesting issues.
If you can knock off access to teh database, then do it in one fell
swoop. If not, then figure out an aritficial way to limit the deletes,
and do it in chunks. Rebuild the indexes one by one in off hours, and
just bite the bullet it will take a few weeks.

Mar 22 '06 #13

Similar topics

MySQL Multi Table Delete

by: Gordon | last post by:

I have 2 tables t and t1. In this case, t1 is a copy of t. I want to delete rows from t1 based on criteria on the t table and a relationship between t ad t1 (in this case the id column). In the...

MySQL Database

foreign key contraints, on delete cascade not working?

by: Andrew DeFaria | last post by:

I created the following .sql file to demonstrate a problem I'm having. According to the manual: If |ON DELETE CASCADE| is specified, and a row in the parent table is deleted, then InnoDB...

MySQL Database

How do I delete many rows without monopolizing server?

by: Bob Ganger | last post by:

Hello, I am working on a project using SQL Server 2000 with a database containing about 10 related tables with a lot of columns containing text. The total current size of the database is about...

Microsoft SQL Server

update transmogrifies to insert/delete

by: robert | last post by:

been ruminating on the question (mostly in a 390/v7 context) of whether, and if so when, a row update becomes an insert/delete. i assume that there is a threshold on the number of columns of the...

DB2 Database

C# delete code

by: NoSpam | last post by:

Hi, I am working with C# and ASP.NET with code behind and a SQL Server. I'm making an e-shop. When clients see what they have in their basket, I added a function DELETE to delete a line. It took...

ASP.NET

Mysql table rows delete HELP

by: Dejan | last post by:

Hy, Sorry for my terreble english I have this simple code for deleting rows in mysql table... Everything works fine with it. So, what do i wanna do...: my sql table looks something like...

PHP

How to delete a SQL Server record (descending from another record of the same table) with a C# application

by: polocar | last post by:

Hi, I'm writing a program in Visual C# 2005 Professional Edition. This program connects to a SQL Server 2005 database called "Generations" (in which there is only one table, called...

C# / C Sharp

Delete MySql Rows Problem

by: imarkdesigns | last post by:

Hello everyone.. a newbie programmer here wants to ask why is my codes for deleting rows from my databse won't work.... seems that this codes spread to the net and no one refuse to correct it... ...

MySQL Database

Improving DELETE performance for a large number of rows

by: Michel Esber | last post by:

Hello, Environment: DB2 LUW v8 FP15 / Linux I have a table with 50+ Million rows. The table structure is basically (ID - Timestamp). I have two main applications - one inserting rows, and the...

DB2 Database

Delete and concurrency problems

by: Michel Esber | last post by:

Hello, Env: DB2 V8 LUW FP16 running Linux create Table X (machine_id varchar(24) not null, ctime timestamp not null); create index iFoo on X (MACHINE_ID, CTIME) allow reverse scans; alter...

DB2 Database

Basic Javascript concepts

by: aa123db | last post by:

Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...

Javascript

Batch import of multiple excel files into the database

by: ryjfgjl | last post by:

If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...

Data Management

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing