473,387 Members | 1,540 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,387 software developers and data experts.

Delete Many Rows

Hi All,
I am designing a purge process for a db that has grown to almost 200GB.
My purge process will remove about 1/3 of the 500 million rows spread
over seven tables. Currently there are about 35 indexes defined on
those seven tables. My question is will there be a performance gain by
dropping those indexes, doing my purge, and re-creating the indexes. I
am afraid that leaving those indexes in place will create a lot of
extra overhead in my delete statements by having to maintain the
indexes. I know that it could take many hours to rebuild the indexes
afterward, but I am planning on doing that anyway. The reason that I
want to know whether I should drop the indexes ahead of time, is I may
not be able to do the entire purge at once and the tables may need to
be accessed between purges. If this occurs, I will need to have those
indexes in place.

So do I drop the indexes before the purge and re-create them later or
do I leave them in place and re-index them afterward?

Thanks In Advance

p.h.

Mar 17 '06 #1
12 2765
How often are queries run on these tables?

--
Jack Vamvas
___________________________________
Receive free SQL tips - www.ciquery.com/sqlserver.htm
<ha*****@yahoo.com> wrote in message
news:11*********************@p10g2000cwp.googlegro ups.com...
Hi All,
I am designing a purge process for a db that has grown to almost 200GB.
My purge process will remove about 1/3 of the 500 million rows spread
over seven tables. Currently there are about 35 indexes defined on
those seven tables. My question is will there be a performance gain by
dropping those indexes, doing my purge, and re-creating the indexes. I
am afraid that leaving those indexes in place will create a lot of
extra overhead in my delete statements by having to maintain the
indexes. I know that it could take many hours to rebuild the indexes
afterward, but I am planning on doing that anyway. The reason that I
want to know whether I should drop the indexes ahead of time, is I may
not be able to do the entire purge at once and the tables may need to
be accessed between purges. If this occurs, I will need to have those
indexes in place.

So do I drop the indexes before the purge and re-create them later or
do I leave them in place and re-index them afterward?

Thanks In Advance

p.h.

Mar 17 '06 #2
I understand that the tables are accessed frequently during the day. I
know that I have to be very sensitive to affecting the response times
of queries against these tables. I am not the DBA that owns this
database, so I do not have access to specific details about the queries
against the tables.

Mar 17 '06 #3
Without knowing the query access during this time it's hard to say, but
another idea
is you can create a Temporary Table (a persistent backup table with all the
date , not a sql temp table) and move the records there. (the ones you want
to keep)
Truncate your parent table. And move the records back to the original table
with the records you want.
Do the tables have clustered indexes?
--
Jack Vamvas
___________________________________
Receive free SQL tips - www.ciquery.com/sqlserver.htm
<ha*****@yahoo.com> wrote in message
news:11********************@p10g2000cwp.googlegrou ps.com...
I understand that the tables are accessed frequently during the day. I
know that I have to be very sensitive to affecting the response times
of queries against these tables. I am not the DBA that owns this
database, so I do not have access to specific details about the queries
against the tables.

Mar 17 '06 #4
Most of the tables do not have a clustered index, one does.

I thought about the temp table option, but the database may get rows
updated while purging the old rows. I would then lose those updates
when I trunc the parent table.

Mar 17 '06 #5
(ha*****@yahoo.com) writes:
I am designing a purge process for a db that has grown to almost 200GB.
My purge process will remove about 1/3 of the 500 million rows spread
over seven tables. Currently there are about 35 indexes defined on
those seven tables. My question is will there be a performance gain by
dropping those indexes, doing my purge, and re-creating the indexes. I
am afraid that leaving those indexes in place will create a lot of
extra overhead in my delete statements by having to maintain the
indexes. I know that it could take many hours to rebuild the indexes
afterward, but I am planning on doing that anyway. The reason that I
want to know whether I should drop the indexes ahead of time, is I may
not be able to do the entire purge at once and the tables may need to
be accessed between purges. If this occurs, I will need to have those
indexes in place.

So do I drop the indexes before the purge and re-create them later or
do I leave them in place and re-index them afterward?


If you really want to know - benchmark. In any case, you should not
run such a heavy operation in prodction, before testing it on a copy
of the live data. In a test environment, you can try different strategies.

Of course, if there is a requirement that that the database is accessible
while you are doing your purge, you will have to find a low-impact purges
that delete fairly small slices at time, and in this case dropping index
out the question.

Again, this why running this in a test environment is important. If
you can say "I was able to run the purge, including restoring of
indexes that I dropped, in eight hours", then it may be deemed that
it is acceptable to take the database offline.

There is another factor to it. If you drop and recreate the indexes,
you will address fragmentation in the indexes, that your purge inevitably
will cause. Worrying here, though, is that only one table has a a clustered
index. This means that the data pages of the other six tables will
reamin fragmented. For this reason, I would recommend that once your
purge has completed, that you build clustered indexes on these tables.
I would also recommend that you keep them in place, but if your DBA thinks
they are bad, you can just drop them. In such case, drop it before you
recreate the non-clustered indexes. (Else it is a costly operation to
drop clustered indexes.)
--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pro...ads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinf...ons/books.mspx
Mar 17 '06 #6
I have been testing heavily on a full copy of the production data.
Unfortunately the test server that I was given is a much slower machine
than the production box.
In my test environment, doing the purge without the indexes is much
faster than purging with the indexes in place. However, it took over
17 hours to rebuild the indexes afterward.
My estimates suggest that purging without the indexes then rebuilding
them will take about 40 straight hours. If I try to do the purge with
the indexes in place, it will take about 200 hours.
If I can get a window large enough, dropping the indexes, purging, and
rebuilding looks like the way to go. If I cannot get that window, then
I have to do it over multiple weekends leaving the indexes in place.
The key is that the indexes must be in place at the end of the window
so that production processing can resume.
Of course these numbers will be much lower on the production box, but
since I cannot test there, I cannot guess how much lower they will be.

Mar 21 '06 #7
Stu
Rather than doing the delete continuously, you may want to delete the
rows in batches ordered by the clustered index column. It may take you
several hours, but at least it'll be a scripted job rather than
watching it do nothing. And, if something blows up, then if you commit
the transaction periodically, the rollback won't suck as much.

Mar 21 '06 #8
My process involves breaking the process into many smaller chunks.
Purging that chunk and doing a commit, then grab the next chunk. So
while it will run continuously during my available window, there will
be a commit after every chunk, which is a few thousand rows. The issue
is that I would like to be able to finish all of the chunks in one
window then rebuild the indexes.

Mar 21 '06 #9
ha*****@yahoo.com (ha*****@yahoo.com) writes:
I have been testing heavily on a full copy of the production data.
Unfortunately the test server that I was given is a much slower machine
than the production box.
That is probably a good thing. :-) At least, you will not get estimates
that are overly optimistic.
In my test environment, doing the purge without the indexes is much
faster than purging with the indexes in place. However, it took over
17 hours to rebuild the indexes afterward.
That's indeed a long time.
My estimates suggest that purging without the indexes then rebuilding
them will take about 40 straight hours. If I try to do the purge with
the indexes in place, it will take about 200 hours.


Hm, unless you have some extra non-business days around Easter you can
use, it sounds like a purge bit by bit with the indexes in place.

Of course, if you can get a window from Friday night to Monday morning,
that will suffice, but it will be a little nervous. Then again,
indexes can always be recreated by restoring a backup, but then all
purge job would be lost.

Stu's suggestion to go by clustered index is a good one, but I recall
that only one of your tables had one.

--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pro...ads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinf...ons/books.mspx
Mar 21 '06 #10
ha*****@yahoo.com (ha*****@yahoo.com) writes:
My process involves breaking the process into many smaller chunks.
Purging that chunk and doing a commit, then grab the next chunk. So
while it will run continuously during my available window, there will
be a commit after every chunk, which is a few thousand rows. The issue
is that I would like to be able to finish all of the chunks in one
window then rebuild the indexes.


A few thousand? That's too small! It depends on how little wide your
tables are, but I would suggest 100000 rows. To small batches can cost
you time, because it takes time to locate the chunks.

While it takes a lot of time, benchmarking is the only way to find out
what is a good value.
--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pro...ads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinf...ons/books.mspx
Mar 21 '06 #11

<ha*****@yahoo.com> wrote in message
news:11*********************@j33g2000cwa.googlegro ups.com...
I have been testing heavily on a full copy of the production data.
Unfortunately the test server that I was given is a much slower machine
than the production box.
In my test environment, doing the purge without the indexes is much
faster than purging with the indexes in place. However, it took over
17 hours to rebuild the indexes afterward.
One suggestion if you can. Move the indexes to separate spindles. (but not
the clustered index).

Also, Enterprise Server can parallize a lot of index work so may be faster.

(one job I run quarterly runs 3-4x faster on one box and it appears the
deciding factor is probably Enterprise vs. Standard.)

My estimates suggest that purging without the indexes then rebuilding
them will take about 40 straight hours. If I try to do the purge with
the indexes in place, it will take about 200 hours.
If I can get a window large enough, dropping the indexes, purging, and
rebuilding looks like the way to go. If I cannot get that window, then
I have to do it over multiple weekends leaving the indexes in place.
The key is that the indexes must be in place at the end of the window
so that production processing can resume.
Of course these numbers will be much lower on the production box, but
since I cannot test there, I cannot guess how much lower they will be.

Mar 22 '06 #12
boy, i sure am not a fan of clustered indexes, and for sure where you
are deleting chunks of data throughout, i do NOT understand any drive
to move to clustered indexes. But I'm sure listening and wanting to
learn!

The basic issues are understood. The sheer scale of this project leads
you to some interesting issues.
If you can knock off access to teh database, then do it in one fell
swoop. If not, then figure out an aritficial way to limit the deletes,
and do it in chunks. Rebuild the indexes one by one in off hours, and
just bite the bullet it will take a few weeks.

Mar 22 '06 #13

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
by: Gordon | last post by:
I have 2 tables t and t1. In this case, t1 is a copy of t. I want to delete rows from t1 based on criteria on the t table and a relationship between t ad t1 (in this case the id column). In the...
1
by: Andrew DeFaria | last post by:
I created the following .sql file to demonstrate a problem I'm having. According to the manual: If |ON DELETE CASCADE| is specified, and a row in the parent table is deleted, then InnoDB...
2
by: Bob Ganger | last post by:
Hello, I am working on a project using SQL Server 2000 with a database containing about 10 related tables with a lot of columns containing text. The total current size of the database is about...
16
by: robert | last post by:
been ruminating on the question (mostly in a 390/v7 context) of whether, and if so when, a row update becomes an insert/delete. i assume that there is a threshold on the number of columns of the...
2
by: NoSpam | last post by:
Hi, I am working with C# and ASP.NET with code behind and a SQL Server. I'm making an e-shop. When clients see what they have in their basket, I added a function DELETE to delete a line. It took...
9
by: Dejan | last post by:
Hy, Sorry for my terreble english I have this simple code for deleting rows in mysql table... Everything works fine with it. So, what do i wanna do...: my sql table looks something like...
6
by: polocar | last post by:
Hi, I'm writing a program in Visual C# 2005 Professional Edition. This program connects to a SQL Server 2005 database called "Generations" (in which there is only one table, called...
4
imarkdesigns
by: imarkdesigns | last post by:
Hello everyone.. a newbie programmer here wants to ask why is my codes for deleting rows from my databse won't work.... seems that this codes spread to the net and no one refuse to correct it... ...
3
by: Michel Esber | last post by:
Hello, Environment: DB2 LUW v8 FP15 / Linux I have a table with 50+ Million rows. The table structure is basically (ID - Timestamp). I have two main applications - one inserting rows, and the...
8
by: Michel Esber | last post by:
Hello, Env: DB2 V8 LUW FP16 running Linux create Table X (machine_id varchar(24) not null, ctime timestamp not null); create index iFoo on X (MACHINE_ID, CTIME) allow reverse scans; alter...
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.