By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
458,112 Members | 1,584 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 458,112 IT Pros & Developers. It's quick & easy.

Delete Many Rows

P: n/a
Hi All,
I am designing a purge process for a db that has grown to almost 200GB.
My purge process will remove about 1/3 of the 500 million rows spread
over seven tables. Currently there are about 35 indexes defined on
those seven tables. My question is will there be a performance gain by
dropping those indexes, doing my purge, and re-creating the indexes. I
am afraid that leaving those indexes in place will create a lot of
extra overhead in my delete statements by having to maintain the
indexes. I know that it could take many hours to rebuild the indexes
afterward, but I am planning on doing that anyway. The reason that I
want to know whether I should drop the indexes ahead of time, is I may
not be able to do the entire purge at once and the tables may need to
be accessed between purges. If this occurs, I will need to have those
indexes in place.

So do I drop the indexes before the purge and re-create them later or
do I leave them in place and re-index them afterward?

Thanks In Advance

p.h.

Mar 17 '06 #1
Share this Question
Share on Google+
12 Replies


P: n/a
How often are queries run on these tables?

--
Jack Vamvas
___________________________________
Receive free SQL tips - www.ciquery.com/sqlserver.htm
<ha*****@yahoo.com> wrote in message
news:11*********************@p10g2000cwp.googlegro ups.com...
Hi All,
I am designing a purge process for a db that has grown to almost 200GB.
My purge process will remove about 1/3 of the 500 million rows spread
over seven tables. Currently there are about 35 indexes defined on
those seven tables. My question is will there be a performance gain by
dropping those indexes, doing my purge, and re-creating the indexes. I
am afraid that leaving those indexes in place will create a lot of
extra overhead in my delete statements by having to maintain the
indexes. I know that it could take many hours to rebuild the indexes
afterward, but I am planning on doing that anyway. The reason that I
want to know whether I should drop the indexes ahead of time, is I may
not be able to do the entire purge at once and the tables may need to
be accessed between purges. If this occurs, I will need to have those
indexes in place.

So do I drop the indexes before the purge and re-create them later or
do I leave them in place and re-index them afterward?

Thanks In Advance

p.h.

Mar 17 '06 #2

P: n/a
I understand that the tables are accessed frequently during the day. I
know that I have to be very sensitive to affecting the response times
of queries against these tables. I am not the DBA that owns this
database, so I do not have access to specific details about the queries
against the tables.

Mar 17 '06 #3

P: n/a
Without knowing the query access during this time it's hard to say, but
another idea
is you can create a Temporary Table (a persistent backup table with all the
date , not a sql temp table) and move the records there. (the ones you want
to keep)
Truncate your parent table. And move the records back to the original table
with the records you want.
Do the tables have clustered indexes?
--
Jack Vamvas
___________________________________
Receive free SQL tips - www.ciquery.com/sqlserver.htm
<ha*****@yahoo.com> wrote in message
news:11********************@p10g2000cwp.googlegrou ps.com...
I understand that the tables are accessed frequently during the day. I
know that I have to be very sensitive to affecting the response times
of queries against these tables. I am not the DBA that owns this
database, so I do not have access to specific details about the queries
against the tables.

Mar 17 '06 #4

P: n/a
Most of the tables do not have a clustered index, one does.

I thought about the temp table option, but the database may get rows
updated while purging the old rows. I would then lose those updates
when I trunc the parent table.

Mar 17 '06 #5

P: n/a
(ha*****@yahoo.com) writes:
I am designing a purge process for a db that has grown to almost 200GB.
My purge process will remove about 1/3 of the 500 million rows spread
over seven tables. Currently there are about 35 indexes defined on
those seven tables. My question is will there be a performance gain by
dropping those indexes, doing my purge, and re-creating the indexes. I
am afraid that leaving those indexes in place will create a lot of
extra overhead in my delete statements by having to maintain the
indexes. I know that it could take many hours to rebuild the indexes
afterward, but I am planning on doing that anyway. The reason that I
want to know whether I should drop the indexes ahead of time, is I may
not be able to do the entire purge at once and the tables may need to
be accessed between purges. If this occurs, I will need to have those
indexes in place.

So do I drop the indexes before the purge and re-create them later or
do I leave them in place and re-index them afterward?


If you really want to know - benchmark. In any case, you should not
run such a heavy operation in prodction, before testing it on a copy
of the live data. In a test environment, you can try different strategies.

Of course, if there is a requirement that that the database is accessible
while you are doing your purge, you will have to find a low-impact purges
that delete fairly small slices at time, and in this case dropping index
out the question.

Again, this why running this in a test environment is important. If
you can say "I was able to run the purge, including restoring of
indexes that I dropped, in eight hours", then it may be deemed that
it is acceptable to take the database offline.

There is another factor to it. If you drop and recreate the indexes,
you will address fragmentation in the indexes, that your purge inevitably
will cause. Worrying here, though, is that only one table has a a clustered
index. This means that the data pages of the other six tables will
reamin fragmented. For this reason, I would recommend that once your
purge has completed, that you build clustered indexes on these tables.
I would also recommend that you keep them in place, but if your DBA thinks
they are bad, you can just drop them. In such case, drop it before you
recreate the non-clustered indexes. (Else it is a costly operation to
drop clustered indexes.)
--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pro...ads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinf...ons/books.mspx
Mar 17 '06 #6

P: n/a
I have been testing heavily on a full copy of the production data.
Unfortunately the test server that I was given is a much slower machine
than the production box.
In my test environment, doing the purge without the indexes is much
faster than purging with the indexes in place. However, it took over
17 hours to rebuild the indexes afterward.
My estimates suggest that purging without the indexes then rebuilding
them will take about 40 straight hours. If I try to do the purge with
the indexes in place, it will take about 200 hours.
If I can get a window large enough, dropping the indexes, purging, and
rebuilding looks like the way to go. If I cannot get that window, then
I have to do it over multiple weekends leaving the indexes in place.
The key is that the indexes must be in place at the end of the window
so that production processing can resume.
Of course these numbers will be much lower on the production box, but
since I cannot test there, I cannot guess how much lower they will be.

Mar 21 '06 #7

P: n/a
Stu
Rather than doing the delete continuously, you may want to delete the
rows in batches ordered by the clustered index column. It may take you
several hours, but at least it'll be a scripted job rather than
watching it do nothing. And, if something blows up, then if you commit
the transaction periodically, the rollback won't suck as much.

Mar 21 '06 #8

P: n/a
My process involves breaking the process into many smaller chunks.
Purging that chunk and doing a commit, then grab the next chunk. So
while it will run continuously during my available window, there will
be a commit after every chunk, which is a few thousand rows. The issue
is that I would like to be able to finish all of the chunks in one
window then rebuild the indexes.

Mar 21 '06 #9

P: n/a
ha*****@yahoo.com (ha*****@yahoo.com) writes:
I have been testing heavily on a full copy of the production data.
Unfortunately the test server that I was given is a much slower machine
than the production box.
That is probably a good thing. :-) At least, you will not get estimates
that are overly optimistic.
In my test environment, doing the purge without the indexes is much
faster than purging with the indexes in place. However, it took over
17 hours to rebuild the indexes afterward.
That's indeed a long time.
My estimates suggest that purging without the indexes then rebuilding
them will take about 40 straight hours. If I try to do the purge with
the indexes in place, it will take about 200 hours.


Hm, unless you have some extra non-business days around Easter you can
use, it sounds like a purge bit by bit with the indexes in place.

Of course, if you can get a window from Friday night to Monday morning,
that will suffice, but it will be a little nervous. Then again,
indexes can always be recreated by restoring a backup, but then all
purge job would be lost.

Stu's suggestion to go by clustered index is a good one, but I recall
that only one of your tables had one.

--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pro...ads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinf...ons/books.mspx
Mar 21 '06 #10

P: n/a
ha*****@yahoo.com (ha*****@yahoo.com) writes:
My process involves breaking the process into many smaller chunks.
Purging that chunk and doing a commit, then grab the next chunk. So
while it will run continuously during my available window, there will
be a commit after every chunk, which is a few thousand rows. The issue
is that I would like to be able to finish all of the chunks in one
window then rebuild the indexes.


A few thousand? That's too small! It depends on how little wide your
tables are, but I would suggest 100000 rows. To small batches can cost
you time, because it takes time to locate the chunks.

While it takes a lot of time, benchmarking is the only way to find out
what is a good value.
--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pro...ads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinf...ons/books.mspx
Mar 21 '06 #11

P: n/a

<ha*****@yahoo.com> wrote in message
news:11*********************@j33g2000cwa.googlegro ups.com...
I have been testing heavily on a full copy of the production data.
Unfortunately the test server that I was given is a much slower machine
than the production box.
In my test environment, doing the purge without the indexes is much
faster than purging with the indexes in place. However, it took over
17 hours to rebuild the indexes afterward.
One suggestion if you can. Move the indexes to separate spindles. (but not
the clustered index).

Also, Enterprise Server can parallize a lot of index work so may be faster.

(one job I run quarterly runs 3-4x faster on one box and it appears the
deciding factor is probably Enterprise vs. Standard.)

My estimates suggest that purging without the indexes then rebuilding
them will take about 40 straight hours. If I try to do the purge with
the indexes in place, it will take about 200 hours.
If I can get a window large enough, dropping the indexes, purging, and
rebuilding looks like the way to go. If I cannot get that window, then
I have to do it over multiple weekends leaving the indexes in place.
The key is that the indexes must be in place at the end of the window
so that production processing can resume.
Of course these numbers will be much lower on the production box, but
since I cannot test there, I cannot guess how much lower they will be.

Mar 22 '06 #12

P: n/a
boy, i sure am not a fan of clustered indexes, and for sure where you
are deleting chunks of data throughout, i do NOT understand any drive
to move to clustered indexes. But I'm sure listening and wanting to
learn!

The basic issues are understood. The sheer scale of this project leads
you to some interesting issues.
If you can knock off access to teh database, then do it in one fell
swoop. If not, then figure out an aritficial way to limit the deletes,
and do it in chunks. Rebuild the indexes one by one in off hours, and
just bite the bullet it will take a few weeks.

Mar 22 '06 #13

This discussion thread is closed

Replies have been disabled for this discussion.