469,647 Members | 1,797 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,647 developers. It's quick & easy.

Slow deletes

Raj
We have a batch process that deletes about 7-8 million records every
day
& takes about 30-35 mins to complete. Is there a way to optimize
deletes? Below is the statement

delete from fact where (key,dt) in (select key,dt from dim1 union
select key,dt from dim2)

Fact table has composite index on key,someothecolumn,dt.
>From the access plan, i see that the index is being used.

Can we implement parallel deletes or tune any other db paramaeter ? We
are on db2 udb v8.2 DPF on 10 logical nodes.
Thanks
Raj

Nov 30 '06 #1
6 3592
Raj wrote:
We have a batch process that deletes about 7-8 million records every
day
& takes about 30-35 mins to complete. Is there a way to optimize
deletes? Below is the statement

delete from fact where (key,dt) in (select key,dt from dim1 union
select key,dt from dim2)

Fact table has composite index on key,someothecolumn,dt.
From the access plan, i see that the index is being used.


Can we implement parallel deletes or tune any other db paramaeter ? We
are on db2 udb v8.2 DPF on 10 logical nodes.
Thanks
Raj
First i'd changes the UNION to a UNION ALL. Probably don't want to run
a DISTINCT before it does the DELETE. Or, just run it in two
statements.

Second, i'd try EXISTS over IN(). Which such a large list IIUC, EXISTS
will utilize the INDEX:

DELETE FROM Fact WHERE EXISTS
(SELECT * FROM FROM Dim1 Dim WHERE Dim.Key = Fact.Key AND Dim.Dt =
Fact.Dt)

DELETE FROM Fact WHERE EXISTS
(SELECT * FROM FROM Dim2 Dim WHERE Dim.Key = Fact.Key AND Dim.Dt =
Fact.Dt)

Another idea is that DELETEing in batches is usually faster. Simplest
way would be to create another table and store all records to be
DELETEd there. Then, DELETE perhaps 10k rows at a time.

CREATE TABLE Fact_Delete(Key INT, Dt INT);
INSERT INTO Fact_Delete
(SELECT Key, Dt FROM Dim1 UNION SELECT Key, Dt FROM Dim2)

DELETE FROM Fact WHERE (Key,Dt) IN
(SELECT Key, Dt FROM Fact_Delete ORDER BY Key FETCH FIRST 10000 ROWS
ONLY)
DELETE FROM Fact_Delete ORDER BY Key FETCH FIRST 10000 ROWS ONLY

Just some ideas.

B.

Nov 30 '06 #2
Raj
Thanks!! I'll try the frequent commit option,
I tried exists but plan has very high timerons vs in.

Will a parallel delete work i.e. running the following in parallel;

delete from fact where (key,dt) in (select key,dt from dim1 union
select key,dt from dim2) and PARTITION(key)=1

delete from fact where (key,dt) in (select key,dt from dim1 union
select key,dt from dim2) and PARTITION(key)=2
..
..
..
..
..
..
dim1 & dim2 are small tables so i will replicate them in all nodes

Brian Tkatch wrote:

Nov 30 '06 #3
Raj wrote:
Thanks!! I'll try the frequent commit option,
I tried exists but plan has very high timerons vs in.
If you have the time it still might be worth trying the exists. The
timerons aren't always 100% accurate - at least, not in my experience.

Another possible option to speed up deletes is to use MDC's on the fact
table. If you cluster on key and dt and turn on MDC Rollout, then the
deletes should go much, much faster (and cause much less logging
overhead). Of course, there are a lot of other considerations to take
into account before doing this - such as disc space usage and the fact
that you would have to completely reload your table(s). Also, I'm
basing this suggestion solely on the literature about MDC's - I haven't
(yet) taken advantage of this so can't give a definite thumbs up.

-Chris

Nov 30 '06 #4
Raj wrote:
Thanks!! I'll try the frequent commit option,
Neat. That's what i did at one point when i ahd the same issue. :)

Note that if you need a TABLE, you don't actually have to CREATE a
TABLE, a GLOBAL TEMPORARY TABLE should work just as well.
I tried exists but plan has very high timerons vs in.

Will a parallel delete work i.e. running the following in parallel;
Way out of my league. Never used the stuff.

B.

>
delete from fact where (key,dt) in (select key,dt from dim1 union
select key,dt from dim2) and PARTITION(key)=1

delete from fact where (key,dt) in (select key,dt from dim1 union
select key,dt from dim2) and PARTITION(key)=2
.
.
.
.
.
.
dim1 & dim2 are small tables so i will replicate them in all nodes

Brian Tkatch wrote:
Nov 30 '06 #5
The DELETE should parallelize on it's on.
Check the explain.

FETCH FIRST and parallelism don't go well together...

Cheers
Serge

--
Serge Rielau
DB2 Solutions Development
IBM Toronto Lab

WAIUG Conference
http://www.iiug.org/waiug/present/Fo...Forum2006.html
Nov 30 '06 #6
If that was z/OS, we would use REORG with DISCARD or UNLOAD/SORT w INCLUDE
to delete unwanted rows/LOAD REPLACE....

Thing to remember is that delete/insert use LOGGING and does slow down the
process (imagine logging 8 million rows, granted that you have frequent
commits - which further slow it down - you've still logged 8 million records
before images). Utilities like LOAD/REORG avoid logging and saves heaps on
resources.... Also, after deletion of so many rows, you would need to reorg
anyway......

HTH,
Ven

"Raj" <sp****@yahoo.comwrote in message
news:11*********************@n67g2000cwd.googlegro ups.com...
We have a batch process that deletes about 7-8 million records every
day
& takes about 30-35 mins to complete. Is there a way to optimize
deletes? Below is the statement

delete from fact where (key,dt) in (select key,dt from dim1 union
select key,dt from dim2)

Fact table has composite index on key,someothecolumn,dt.
>>From the access plan, i see that the index is being used.


Can we implement parallel deletes or tune any other db paramaeter ? We
are on db2 udb v8.2 DPF on 10 logical nodes.
Thanks
Raj

Dec 2 '06 #7

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

3 posts views Thread by Jeremy Howard | last post: by
2 posts views Thread by Craig Stadler | last post: by
16 posts views Thread by Dave Weaver | last post: by
7 posts views Thread by mp | last post: by
3 posts views Thread by jdipalmajr | last post: by
2 posts views Thread by Brock Henry | last post: by
14 posts views Thread by Michel Esber | last post: by
reply views Thread by Raj | last post: by
reply views Thread by gheharukoh7 | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.