By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
438,226 Members | 1,027 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 438,226 IT Pros & Developers. It's quick & easy.

fastest way to delete 1 billion rows from a table haveing 9 billion records

P: 2
I have used the below procedure Serge posted long time back and used between date1 and date2 in the where clause witha a commit count of 1 million.
But it is deleting only 250K rows / min. At this rate it will take me 3-4 days to delete eveything.
The field I am using for where clause is a defined index.
Do you know of or can think of a faster way to do this ?
I wish we could load from /dev/null with a where clasue !

Thanks,

Alpesh.


CREATE PROCEDURE DELETE_MANY_ROWS
(tabschema VARCHAR(128),
tabname VARCHAR(128),
predicate VARCHAR(1000),
commitcount INTEGER)
BEGIN
DECLARE SQLCODE INTEGER;
DECLARE txt VARCHAR(10000);
DECLARE stmt STATEMENT;
SET txt = 'DELETE FROM (SELECT 1 FROM "'
|| tabschema || '"."' || tabname || '" WHERE '
|| predicate || ' FETCH FIRST ' ||
RTRIM(CHAR(commitcount)) || ' ROWS ONLY) AS D';
PREPARE stmt FROM txt;
l: LOOP
EXECUTE stmt;
IF SQLCODE = 100 THEN LEAVE l; END IF;
COMMIT;
END LOOP;
END
%

CALL DELETE_MANY_ROWS('SRIELAU', 'T1', 'c1 > 0', 1000)
%

Quiz!
How can performance be improved if the table has e.g. a known primary key?

Cheers
Serge
--
Serge Rielau
DB2 SQL Compiler Development
IBM Toronto Lab
Aug 16 '07 #1
Share this Question
Share on Google+
5 Replies


docdiesel
Expert 100+
P: 297
Hi Alpesh,

does the table always have to be online? If not, this approach may help:

1) create a 2nd table NEWTABLE with the same structure as OLDTABLE
2) create a cursor as "select ... from OLDTABLE where NOT delete_condition"
3) load NEWTABLE from cursor
4) drop OLDTABLE
5) create a view named OLDTABLE as "select .. from NEWTABLE"

You've got to move more rows than are to be deleted, but for that LOAD works w/o the logs and therefore is rather fast, this may be faster than the log depending delete method.

Regards,


Bernd
Aug 16 '07 #2

P: 2
Brent the problem is the size of the table even without the data to be deleted is 1.5 TB and I do not have that kind of storage available to move the data.
Trust me that would have been the first thing I would have done if I had that storage.
Aug 16 '07 #3

docdiesel
Expert 100+
P: 297
Hi Alpesh,

um,... aaaah,... yes, I see, 1.5TB is quiet big. I wouldn't like to have to move that, too.

So, I guess you're working with DB2 on z/OS? I'm not familiar with that one (yet). But with my DB2 on Linux I'd try another approach, consisting of

* playing with commit count,
* increasing the bufferpool,
* increasing size and number of logs,
* spread the logs over different disks.

Don't know the structure of the data within the table, but primary keys and/or indexes and a select based on such a column should help significantely. (Once had a mass-insert based on a bigger, joined select; interupted it after 24 hours. Added an index and it was done in 15 minutes !!)

Regards,

Bernd
Aug 16 '07 #4

P: 7
maybe use the "not logged initially" is another way to improve the performence.
Aug 17 '07 #5

P: 17
Before deleting activate the not logged initially and commit statement after the delete - would improve the performance.
Aug 17 '07 #6

Post your reply

Sign in to post your reply or Sign up for a free account.