deleting large numbers of records

Frank Swarbrick

We have a batch process that inserts large numbers (100,000 - 1,000,000) of
records into a database each day. (DL/I database.) We're considering
converting it to a DB2 table. Currently we have logic in place that, prior
to inserting any data, reads the first input record and checks to see if it
already exists in the table. If the record already exists there are two
options:
1) Don't continue, because you already ran this job today!
2) This is a rerun - continue.

If number 2 is selected the first thing that happens is that it deletes
every record that was inserted today prior to doing the regular insert
process. (You may ask, why not just skip over the ones that are already
there. It's because we may be rerunning with an updated input file, where
the input records may be different than during the first run.)

Anyway, I figured with DB2 this would be a snap. I'll I'd need to do is:
EXEC SQL
DELETE FROM FILM.FILM_TRANSACTIONS
WHERE UPDATE_DATE = FB_FUNC.TO_DATE(:CURR-DATE-JUL-PACKED)
END-EXEC

The only problem is that my log file would end up running out of room. So
now I've come up with the following:

DELETE-TODAY SECTION.
DISPLAY 'DELETE PROCESS BEGINS' UPON CONSOLE
PERFORM WITH TEST AFTER
UNTIL SQLCODE = 100
DISPLAY 'COMMITTING...' UPON CONSOLE
PERFORM COMMIT-UOW
DISPLAY 'DELETING 10000' UPON CONSOLE
PERFORM DB2-DELETE-TODAY
END-PERFORM
PERFORM COMMIT-UOW
DISPLAY 'DELETE PROCESS ENDS' UPON CONSOLE

Jan 24 '07 #1

Subscribe Post Reply

21474

Mark A

"Frank Swarbrick" <Fr*************@efirstbank.comwrote in message
news:51*************@mid.individual.net...

We have a batch process that inserts large numbers (100,000 - 1,000,000)
of
records into a database each day. (DL/I database.) We're considering
converting it to a DB2 table. Currently we have logic in place that,
prior
to inserting any data, reads the first input record and checks to see if
it
already exists in the table. If the record already exists there are two
options:
1) Don't continue, because you already ran this job today!
2) This is a rerun - continue.

If number 2 is selected the first thing that happens is that it deletes
every record that was inserted today prior to doing the regular insert
process. (You may ask, why not just skip over the ones that are already
there. It's because we may be rerunning with an updated input file, where
the input records may be different than during the first run.)

Anyway, I figured with DB2 this would be a snap. I'll I'd need to do is:
EXEC SQL
DELETE FROM FILM.FILM_TRANSACTIONS
WHERE UPDATE_DATE = FB_FUNC.TO_DATE(:CURR-DATE-JUL-PACKED)
END-EXEC

The only problem is that my log file would end up running out of room. So
now I've come up with the following:

DELETE-TODAY SECTION.
DISPLAY 'DELETE PROCESS BEGINS' UPON CONSOLE
PERFORM WITH TEST AFTER
UNTIL SQLCODE = 100
DISPLAY 'COMMITTING...' UPON CONSOLE
PERFORM COMMIT-UOW
DISPLAY 'DELETING 10000' UPON CONSOLE
PERFORM DB2-DELETE-TODAY
END-PERFORM
PERFORM COMMIT-UOW
DISPLAY 'DELETE PROCESS ENDS' UPON CONSOLE
.

DB2-DELETE-TODAY SECTION.
EXEC SQL
DELETE FROM (
SELECT UPDATE_DATE
FROM FILM.FILM_TRANSACTIONS
WHERE UPDATE_DATE = FB_FUNC.TO_DATE(:CURR-DATE-JUL-PACKED)
FETCH FIRST 10000 ROWS ONLY
)
WHERE 1 = 1
END-EXEC
CALL CHECKERR USING SQLCA ERRLOC
.

My question is, is this the way to go or is there some better way?

I tried making the "10000" a host variable, but that didn't work. Any way
around this?

You may wondering why I put the "WHERE 1 = 1" clause on the DELETE
statement. This is because DB2 gives a warning if you pre-compile a
DELETE
or UPDATE statement without a WHERE clause. Still works, but I like to
avoid warnings.

Thanks!
Frank
---
Frank Swarbrick
Senior Developer/Analyst - Mainframe Applications
FirstBank Data Corporation - Lakewood, CO USA

Yes there is a better way that will avoid filling up your DB2 z/OS logs.

Declare a cursor with something like:
SELECT UPDATE_DATE FROM FILM.FILM_TRANSACTIONS
WHERE UPDATE_DATE = FB_FUNC.TO_DATE(:CURR-DATE-JUL-PACKED) FOR
UPDATE WITH HOLD

OPEN cursor-name.

Then FETCH one row at a time (until RC = 100).

DELETE WHERE CURRENT OF C1 (cursor name).

COMMIT. You could commit every 100 - 10000 if you don't want to commit each
row.

CLOSE cursor-name.

Note that the WITH HOLD will retain the cursor position even though you have
committed (which usually closes a cursor).

Please consult the SQL Reference for details, because I am posting all the
above syntax from (my) memory.

Also, you find that there is very little increase in performance in
committing more often than every 100 updates. Committing more often will
provide increased concurrency with other applications.

Jan 24 '07 #2

Mark A

"Mark A" <no****@nowhere.comwrote in message
news:pO******************************@comcast.com. ..

Also, you find that there is very little increase in performance in
committing more often than every 100 updates. Committing more often will
provide increased concurrency with other applications.

Correction, you find that there is very little increase in performance in
committing LESS often than every 100 updates.

Jan 24 '07 #3

jefftyzzer

I think both solutions have merit. Frank, your version is very much
like the approach outlined in Serge's "SQL on Fire" seminars, albeit in
COBOL as opposed to SQL/PL. Mark's solution will use less log space
than yours, but I respectfully wonder about the time required to delete
all of the target rows when they're being fetched and deleted
one-at-a-time.

FWIW, here's how I've done it (in a stored procedure):

CREATE PROCEDURE CSE.PURGE_LRD()
BEGIN

DECLARE V_NO_DATA SMALLINT DEFAULT 0;--
DECLARE V_DEADLOCK_OR_LTO SMALLINT DEFAULT 0;--

DECLARE C_DEADLOCK_OR_LTO CONDITION FOR SQLSTATE '40001';--

DECLARE CONTINUE HANDLER FOR NOT FOUND
SET V_NO_DATA = 1;--

-- The V_DEADLOCK_OR_LTO attribute is throw-away,
-- but a continue handler needs to do something,
-- i.e., it's not enough to just declare a handler,
-- it has to have an action in its body.
DECLARE CONTINUE HANDLER FOR C_DEADLOCK_OR_LTO
SET V_DEADLOCK_OR_LTO = 1;--

WHILE (V_NO_DATA = 0) DO
DELETE FROM
(
SELECT
1
FROM
LRD
FETCH FIRST 200000 ROWS ONLY
) AS LRD_D;--

COMMIT;--
END WHILE;--
END;

--Jeff

On Jan 24, 3:44 pm, "Mark A" <nob...@nowhere.comwrote:

"Mark A" <nob...@nowhere.comwrote in messagenews:pO******************************@comca st.com...

Also, you find that there is very little increase in performance in
committing more often than every 100 updates. Committing more often will
provide increased concurrency with other applications.Correction, you find that there is very little increase in performance in
committing LESS often than every 100 updates.

Jan 25 '07 #4

Frank Swarbrick

Mark A<no****@nowhere.com01/24/07 4:36 PM >>>

>Yes there is a better way that will avoid filling up your DB2 z/OS logs.

Actually, DB2/LUW, but I'm guessing your advice still applies.

>Declare a cursor with something like:
SELECT UPDATE_DATE FROM FILM.FILM_TRANSACTIONS
WHERE UPDATE_DATE = FB_FUNC.TO_DATE(:CURR-DATE-JUL-PACKED) FOR
UPDATE WITH HOLD

OPEN cursor-name.

Then FETCH one row at a time (until RC = 100).

DELETE WHERE CURRENT OF C1 (cursor name).

COMMIT. You could commit every 100 - 10000 if you don't want to commit each

>row.

CLOSE cursor-name.

Interesting. I just figured that this would be much less efficient than
doing just the delete with the fullselect, because in the latter case no
data need be returned to the AR. Anyway, I will give it a shot.

>Note that the WITH HOLD will retain the cursor position even though you

have

>committed (which usually closes a cursor).

Please consult the SQL Reference for details, because I am posting all the

>above syntax from (my) memory.

Also, you find that there is very little increase in performance in
committing more often than every 100 updates. Committing more often will
provide increased concurrency with other applications.

Will keep that in mind. Thanks!

Frank

---
Frank Swarbrick
Senior Developer/Analyst - Mainframe Applications
FirstBank Data Corporation - Lakewood, CO USA

Jan 25 '07 #5

Frank Swarbrick

Oh! So every 10,000 actually does make sense then, right?

>>Mark A<no****@nowhere.com01/24/07 4:44 PM >>>

"Mark A" <no****@nowhere.comwrote in message
news:pO******************************@comcast.com. ..

Also, you find that there is very little increase in performance in
committing more often than every 100 updates. Committing more often will
provide increased concurrency with other applications.

Correction, you find that there is very little increase in performance in
committing LESS often than every 100 updates.

Jan 25 '07 #6

Frank Swarbrick

jefftyzzer<je********@sbcglobal.net01/24/07 5:54 PM >>>

>I think both solutions have merit. Frank, your version is very much
like the approach outlined in Serge's "SQL on Fire" seminars, albeit in
COBOL as opposed to SQL/PL. Mark's solution will use less log space
than yours, but I respectfully wonder about the time required to delete
all of the target rows when they're being fetched and deleted
one-at-a-time.

That was my concern as well. In any case, I will give both methods a shot
and see which I prefer.

>FWIW, here's how I've done it (in a stored procedure):

CREATE PROCEDURE CSE.PURGE_LRD()
BEGIN

DECLARE V_NO_DATA SMALLINT DEFAULT 0;--
DECLARE V_DEADLOCK_OR_LTO SMALLINT DEFAULT 0;--

DECLARE C_DEADLOCK_OR_LTO CONDITION FOR SQLSTATE '40001';--

DECLARE CONTINUE HANDLER FOR NOT FOUND
SET V_NO_DATA = 1;--

-- The V_DEADLOCK_OR_LTO attribute is throw-away,
-- but a continue handler needs to do something,
-- i.e., it's not enough to just declare a handler,
-- it has to have an action in its body.
DECLARE CONTINUE HANDLER FOR C_DEADLOCK_OR_LTO
SET V_DEADLOCK_OR_LTO = 1;--

WHILE (V_NO_DATA = 0) DO
DELETE FROM
(
SELECT
1
FROM
LRD
FETCH FIRST 200000 ROWS ONLY
) AS LRD_D;--

COMMIT;--
END WHILE;--
END;

My probably very naive thought is that it would be nice to have some sort of
DELETE statement that didn't even do logging. While often (usually?) you
would want to be able to ROLLBACK a DELETE, in the case of what I'm doing
there's no reason I would ever want to rollback. So why log? Just
wondering...

Thanks!

Frank

---
Frank Swarbrick
Senior Developer/Analyst - Mainframe Applications
FirstBank Data Corporation - Lakewood, CO USA

Jan 25 '07 #7

Mark A

"Frank Swarbrick" <Fr*************@efirstbank.comwrote in message
news:51*************@mid.individual.net...

Oh! So every 10,000 actually does make sense then, right?

No. I would commit every 100 - 1000 rows. A commit is not that expensive,
unless you do it for every row. Doing it every 100 rows or 10,000 will
provide about a 2% difference in performance at best.

The major expense of a commit is a synchronous write of the log buffer to
disk. But the log buffer will be flushed to disk anyway whenever the log
buffer is full, or about every 1 second. In any case, a commit is usually
only as expensive as a few deletes, so do the math.

Jan 25 '07 #8

Mark A

"Frank Swarbrick" <Fr*************@efirstbank.comwrote in message
news:51*************@mid.individual.net...

Interesting. I just figured that this would be much less efficient than
doing just the delete with the fullselect, because in the latter case no
data need be returned to the AR. Anyway, I will give it a shot.

It depends on where the program runs. If the program runs on server and the
static SQL is bound into a package that runs on the server, then there is
not that much difference in performance (unless performance is
ultra-critical). If the program is running remotely, then there would be a
big difference in performance.

I have written SQL stored procedures to do mass deletes with a cursor and it
performs well.

I haven't seen to many COBOL programs running on DB2 LUW. What compiler are
you using? I used MicroFocus COBOL against OS/2 Database Manager, but that
was in 1991.

Jan 25 '07 #9

Knut Stolze

Frank Swarbrick wrote:

jefftyzzer<je********@sbcglobal.net01/24/07 5:54 PM >>>
>>I think both solutions have merit. Frank, your version is very much
like the approach outlined in Serge's "SQL on Fire" seminars, albeit in
COBOL as opposed to SQL/PL. Mark's solution will use less log space
than yours, but I respectfully wonder about the time required to delete
all of the target rows when they're being fetched and deleted
one-at-a-time.

That was my concern as well. In any case, I will give both methods a shot
and see which I prefer.

>>FWIW, here's how I've done it (in a stored procedure):

CREATE PROCEDURE CSE.PURGE_LRD()
BEGIN

DECLARE V_NO_DATA SMALLINT DEFAULT 0;--
DECLARE V_DEADLOCK_OR_LTO SMALLINT DEFAULT 0;--

DECLARE C_DEADLOCK_OR_LTO CONDITION FOR SQLSTATE '40001';--

DECLARE CONTINUE HANDLER FOR NOT FOUND
SET V_NO_DATA = 1;--

-- The V_DEADLOCK_OR_LTO attribute is throw-away,
-- but a continue handler needs to do something,
-- i.e., it's not enough to just declare a handler,
-- it has to have an action in its body.
DECLARE CONTINUE HANDLER FOR C_DEADLOCK_OR_LTO
SET V_DEADLOCK_OR_LTO = 1;--

WHILE (V_NO_DATA = 0) DO
DELETE FROM
(
SELECT
1
FROM
LRD
FETCH FIRST 200000 ROWS ONLY
) AS LRD_D;--

COMMIT;--
END WHILE;--
END;

My probably very naive thought is that it would be nice to have some sort
of
DELETE statement that didn't even do logging. While often (usually?) you
would want to be able to ROLLBACK a DELETE, in the case of what I'm doing
there's no reason I would ever want to rollback. So why log? Just
wondering...

The logs are also used for crash recovery. Let's assume you run the
unlogged DELETE. Now your application or the DB2 server crashes before you
issued a COMMIT. Upon restart, DB2 has to make sure the transaction is
properly rolled back and the database is in a consistent state. If you
don't log the DELETE, you are out of luck there.

What would be nice to have in this respect is an option for the DELETE
statement to explicitly turn off logging - which would have a certain
amount of problems as I just mentioned. Truncating a whole table is
supported that way already: you can use ALTER TABLE ... ACTIVATE NOT LOGGED
INITIALLY WITH EMPTY TABLE for that. Maybe this, combined with range
partitioning is an option for you?

--
Knut Stolze
DB2 z/OS Utilities Development
IBM Germany

Jan 25 '07 #10

Knut Stolze

Mark A wrote:

The major expense of a commit is a synchronous write of the log buffer to
disk.

Not necessarily if group commit is used.

--
Knut Stolze
DB2 z/OS Utilities Development
IBM Germany

Jan 25 '07 #11

Knut Stolze

Frank Swarbrick wrote:

Mark A<no****@nowhere.com01/24/07 4:36 PM >>>
>>Yes there is a better way that will avoid filling up your DB2 z/OS logs.

Actually, DB2/LUW, but I'm guessing your advice still applies.

In that case, have you considered the MERGE statement? Then you may not
have to DELETE the rows at all - just UPDATE them.

--
Knut Stolze
DB2 z/OS Utilities Development
IBM Germany

Jan 25 '07 #12

Frank:
Here's an OLAPy trick that I sometimes use:

Let's say you want to delete rows from a very large table based on a
sysdate column. You *don't* want to overfill the transaction logs

The answer: Figure out how many rows you can safely delete w/ your
logs, use row_number() to slap a number on each one, and delete based
not only on your sysdate, but also that number.
Let's say you can safely delete up to 200000 rows, and you only want to
delete rows where sysdate = 5/1/2005:

lock table mytable in exclusive mode ;
delete
FROM (SELECT sysdate , row_number() OVER
(ORDER BY sysdate)
AS rn FROM mytable)
AS tr WHERE rn BETWEEN 1 and 200000 and sysdate = '5/1/2005' ;
COMMIT ;

Stick this in a loop and stop when no rows get deleted anymore.

HTH

aj

Frank Swarbrick wrote:

We have a batch process that inserts large numbers (100,000 - 1,000,000) of
records into a database each day. (DL/I database.) We're considering
converting it to a DB2 table. Currently we have logic in place that, prior
to inserting any data, reads the first input record and checks to see if it
already exists in the table. If the record already exists there are two
options:
1) Don't continue, because you already ran this job today!
2) This is a rerun - continue.

If number 2 is selected the first thing that happens is that it deletes
every record that was inserted today prior to doing the regular insert
process. (You may ask, why not just skip over the ones that are already
there. It's because we may be rerunning with an updated input file, where
the input records may be different than during the first run.)

Anyway, I figured with DB2 this would be a snap. I'll I'd need to do is:
EXEC SQL
DELETE FROM FILM.FILM_TRANSACTIONS
WHERE UPDATE_DATE = FB_FUNC.TO_DATE(:CURR-DATE-JUL-PACKED)
END-EXEC

The only problem is that my log file would end up running out of room. So
now I've come up with the following:

DELETE-TODAY SECTION.
DISPLAY 'DELETE PROCESS BEGINS' UPON CONSOLE
PERFORM WITH TEST AFTER
UNTIL SQLCODE = 100
DISPLAY 'COMMITTING...' UPON CONSOLE
PERFORM COMMIT-UOW
DISPLAY 'DELETING 10000' UPON CONSOLE
PERFORM DB2-DELETE-TODAY
END-PERFORM
PERFORM COMMIT-UOW
DISPLAY 'DELETE PROCESS ENDS' UPON CONSOLE
.

DB2-DELETE-TODAY SECTION.
EXEC SQL
DELETE FROM (
SELECT UPDATE_DATE
FROM FILM.FILM_TRANSACTIONS
WHERE UPDATE_DATE = FB_FUNC.TO_DATE(:CURR-DATE-JUL-PACKED)
FETCH FIRST 10000 ROWS ONLY
)
WHERE 1 = 1
END-EXEC
CALL CHECKERR USING SQLCA ERRLOC
.

My question is, is this the way to go or is there some better way?

I tried making the "10000" a host variable, but that didn't work. Any way
around this?

You may wondering why I put the "WHERE 1 = 1" clause on the DELETE
statement. This is because DB2 gives a warning if you pre-compile a DELETE
or UPDATE statement without a WHERE clause. Still works, but I like to
avoid warnings.

Thanks!
Frank
---
Frank Swarbrick
Senior Developer/Analyst - Mainframe Applications
FirstBank Data Corporation - Lakewood, CO USA

Jan 25 '07 #13

ChrisC

Another option might be to create the table as an MDC (MultiDimensional
Cluster) table, clustered on UPDATE_DATE, and then turn on the option
MDC ROLLOUT (not sure of exact syntax). This should then allow you to
just delete all the rows for that date. According to the literature,
it should just mark each block (set of pages) for that cluster as
deleted, log each block as being deleted, and commit. If you think
about this, it should delete the 1,000,000 or so rows very quickly and
NOT fill up your log files.

Disclaimer - I haven't (yet) been able to use this myself, so no actual
experience here. But if this is a new DB2 table/DB, it might be a
great time to check this feature out.

-Chris

Jan 25 '07 #14

Frank Swarbrick

Knut Stolze<st****@de.ibm.com01/25/07 12:41 AM >>>

>Frank Swarbrick wrote:

>My probably very naive thought is that it would be nice to have some

sort

>of
DELETE statement that didn't even do logging. While often (usually?)

you

>would want to be able to ROLLBACK a DELETE, in the case of what I'm

doing

>there's no reason I would ever want to rollback. So why log? Just
wondering...

The logs are also used for crash recovery. Let's assume you run the
unlogged DELETE. Now your application or the DB2 server crashes before

you

>issued a COMMIT. Upon restart, DB2 has to make sure the transaction is
properly rolled back and the database is in a consistent state. If you
don't log the DELETE, you are out of luck there.

Are you saying that DB2 occasionally crashes?
:-) (Just kidding.)
As you can tell, I'm hopelessly naive about these things. I'm just a simple
application programmer.

>What would be nice to have in this respect is an option for the DELETE
statement to explicitly turn off logging - which would have a certain
amount of problems as I just mentioned. Truncating a whole table is
supported that way already: you can use ALTER TABLE ... ACTIVATE NOT

LOGGED

>INITIALLY WITH EMPTY TABLE for that. Maybe this, combined with range
partitioning is an option for you?

I don't know about range partitioning. Can you give me a pointer to some
information on this?

Thanks!

Frank
---
Frank Swarbrick
Senior Developer/Analyst - Mainframe Applications
FirstBank Data Corporation - Lakewood, CO USA

Jan 25 '07 #15

Frank Swarbrick

Mark A<no****@nowhere.com01/24/07 7:40 PM >>>

>"Frank Swarbrick" <Fr*************@efirstbank.comwrote in message
news:51*************@mid.individual.net...
>Interesting. I just figured that this would be much less efficient than
doing just the delete with the fullselect, because in the latter case no
data need be returned to the AR. Anyway, I will give it a shot.

It depends on where the program runs. If the program runs on server and the

>static SQL is bound into a package that runs on the server, then there is
not that much difference in performance (unless performance is
ultra-critical). If the program is running remotely, then there would be a

>big difference in performance.

I have written SQL stored procedures to do mass deletes with a cursor and

>performs well.

I haven't seen to many COBOL programs running on DB2 LUW. What compiler are

>you using? I used MicroFocus COBOL against OS/2 Database Manager, but that

>was in 1991.

We're doing it in kind of an odd way. And for now, we're only testing. We
are using "DB2 Server for VSE" as the client, with the IBM COBOL for VSE/ESA
compiler. But all of our databases are remote databases on DB2/LUW.

So we definitely fall in to the category of a remote client, not a client
running on the server.

But to answer your question anyway, I have been successful using both Micro
Focus Net Express (COBOL) 5.0 as well as OpenCobol 0.33 to access DB2/LUW
databases.

Frank
---
Frank Swarbrick
Senior Developer/Analyst - Mainframe Applications
FirstBank Data Corporation - Lakewood, CO USA

Jan 25 '07 #16

Frank Swarbrick

Knut Stolze<st****@de.ibm.com01/25/07 1:38 AM >>>

>Frank Swarbrick wrote:

>Mark A<no****@nowhere.com01/24/07 4:36 PM >>>
>>>Yes there is a better way that will avoid filling up your DB2 z/OS logs.

Actually, DB2/LUW, but I'm guessing your advice still applies.

In that case, have you considered the MERGE statement? Then you may not
have to DELETE the rows at all - just UPDATE them.

Yet another thing I am not familiar with. I will look into it. Thanks.

Frank

---
Frank Swarbrick
Senior Developer/Analyst - Mainframe Applications
FirstBank Data Corporation - Lakewood, CO USA

Jan 25 '07 #17

Frank Swarbrick

aj<ro****@mcdonalds.com01/25/07 7:10 AM >>>

>Frank:
Here's an OLAPy trick that I sometimes use:

Let's say you want to delete rows from a very large table based on a
sysdate column. You *don't* want to overfill the transaction logs

The answer: Figure out how many rows you can safely delete w/ your
logs, use row_number() to slap a number on each one, and delete based
not only on your sysdate, but also that number.
Let's say you can safely delete up to 200000 rows, and you only want to
delete rows where sysdate = 5/1/2005:

lock table mytable in exclusive mode ;
delete
FROM (SELECT sysdate , row_number() OVER
(ORDER BY sysdate)
AS rn FROM mytable)
AS tr WHERE rn BETWEEN 1 and 200000 and sysdate = '5/1/2005' ;
COMMIT ;

Stick this in a loop and stop when no rows get deleted anymore.

Sounds interesting. And brings up another question. Is there any way to
dynamically determine how many rows I can delete w/o filling up the logs?

Lots of good responses to this. Thanks all!

Frank
---
Frank Swarbrick
Senior Developer/Analyst - Mainframe Applications
FirstBank Data Corporation - Lakewood, CO USA

Jan 25 '07 #18

Knut Stolze

Frank Swarbrick wrote:

>>The logs are also used for crash recovery. Let's assume you run the
unlogged DELETE. Now your application or the DB2 server crashes before
you
>>issued a COMMIT. Upon restart, DB2 has to make sure the transaction is
properly rolled back and the database is in a consistent state. If you
don't log the DELETE, you are out of luck there.

Are you saying that DB2 occasionally crashes?
:-) (Just kidding.)

I can't really comment on that. DB2 crashes quite often in my environment -
sometimes on purpose, sometimes not. If not, then it is usually due to my
(wrong) code changes, of course. ;-)

Anyway, just think of someone tripping over the power cable or using Windows
as OS...

>>What would be nice to have in this respect is an option for the DELETE
statement to explicitly turn off logging - which would have a certain
amount of problems as I just mentioned. Truncating a whole table is
supported that way already: you can use ALTER TABLE ... ACTIVATE NOT
LOGGED
>>INITIALLY WITH EMPTY TABLE for that. Maybe this, combined with range
partitioning is an option for you?

I don't know about range partitioning. Can you give me a pointer to some
information on this?

I guess Serge is the most knowledgeable about this. In a nutshell: you have
one logical table that is internally stored as multiple physical tables.
DB2 will distribute your data across those physical tables. For that, it
needs some criteria/algorithm for the distribution. With range
partitioning, you define ranges and a value in a row that fits into one
range goes into the physical table for that range. During query time, the
DB2 optimizer will analyze the query and if it finds that the query
searches on ranges, it can eliminate scanning some/most of the physical
tables, for instance.

Another side effect is that you have now (V9) ALTER TABLE ... ATTACH
PARTITION and ALTER TABLE ... DETACH PARTITION SQL statements.
(http://publib.boulder.ibm.com/infoce...c/r0000888.htm)
Essentially, those statements switch a regular base table to such a
mentioned physical table and group it to the logical table - or vice versa.
Thus, you can roll-in and roll-out ranges of a table with a single SQL
statement.

If you can partition your table according to your deletion criteria, you can
detach the internal, physical table holding the data you want to remove.
It becomes a regular table, which you can drop.

p.s: I hope I didn't screw up too much on the terminology.

--
Knut Stolze
DB2 z/OS Utilities Development
IBM Germany

Jan 25 '07 #19

Mark A

"Frank Swarbrick" <Fr*************@efirstbank.comwrote in message
news:51*************@mid.individual.net...

We're doing it in kind of an odd way. And for now, we're only testing.
We
are using "DB2 Server for VSE" as the client, with the IBM COBOL for
VSE/ESA
compiler. But all of our databases are remote databases on DB2/LUW.

So we definitely fall in to the category of a remote client, not a client
running on the server.

But to answer your question anyway, I have been successful using both
Micro
Focus Net Express (COBOL) 5.0 as well as OpenCobol 0.33 to access DB2/LUW
databases.

Frank

Given the above, I would create an SQL stored procedure to do the deletes.
It will run on the LUW server (you can call it from a remote client with any
parms you want) and it should perform quite well.

Jan 26 '07 #20

Frank Swarbrick

Mark A<no****@nowhere.com01/25/07 5:45 PM >>>

>"Frank Swarbrick" <Fr*************@efirstbank.comwrote in message
news:51*************@mid.individual.net...
>We're doing it in kind of an odd way. And for now, we're only testing.
We
are using "DB2 Server for VSE" as the client, with the IBM COBOL for
VSE/ESA
compiler. But all of our databases are remote databases on DB2/LUW.

So we definitely fall in to the category of a remote client, not a

client

>running on the server.

But to answer your question anyway, I have been successful using both
Micro
Focus Net Express (COBOL) 5.0 as well as OpenCobol 0.33 to access

DB2/LUW

>databases.

Frank

Given the above, I would create an SQL stored procedure to do the deletes.

>It will run on the LUW server (you can call it from a remote client with

any

>parms you want) and it should perform quite well.

Sounds right. To be honest, we probably would not do that for this
particular case, simply because it's a situation that occurs just this side
of never. But I will keep it in mind.
Thanks,
Frank
---
Frank Swarbrick
Senior Developer/Analyst - Mainframe Applications
FirstBank Data Corporation - Lakewood, CO USA

Jan 26 '07 #21

Gregor =?UTF-8?B?S292YcSN?=

Hi!

I'd propose another method that involves EXPORT and a LOAD operation:
- first you EXPORT the rows you want to preserve
- then do a LOAD with REPLACE
Example:
1.) EXPORT FROM SCHEMA.TABLE1 INTO TABLE1.IXF OF IXF SELECT * FROM
SCHEMA.TABLE1 WHERE DATE BETWEEN CURRENT DATE AND CURRENT DATE - 3 DAYS
2.) LOAD FROM TABLE1.IXF OF IXF REPLACE INTO SCHEMA.TABLE1
After running this you'll probably have to do some SET INTEGRITY statements
if you have RI.

I only know DB2 for LUW, but if your ADMIN_CMD procedure supports EXPORT and
LOAD oparations then you could even do this via SQL procedure.

Best regards,
Kovi

Frank Swarbrick wrote:

We have a batch process that inserts large numbers (100,000 - 1,000,000)
of
records into a database each day. (DL/I database.) We're considering
converting it to a DB2 table. Currently we have logic in place that,
prior to inserting any data, reads the first input record and checks to
see if it
already exists in the table. If the record already exists there are two
options:
1) Don't continue, because you already ran this job today!
2) This is a rerun - continue.

If number 2 is selected the first thing that happens is that it deletes
every record that was inserted today prior to doing the regular insert
process. (You may ask, why not just skip over the ones that are already
there. It's because we may be rerunning with an updated input file, where
the input records may be different than during the first run.)

Anyway, I figured with DB2 this would be a snap. I'll I'd need to do is:
EXEC SQL
DELETE FROM FILM.FILM_TRANSACTIONS
WHERE UPDATE_DATE = FB_FUNC.TO_DATE(:CURR-DATE-JUL-PACKED)
END-EXEC

The only problem is that my log file would end up running out of room. So
now I've come up with the following:

DELETE-TODAY SECTION.
DISPLAY 'DELETE PROCESS BEGINS' UPON CONSOLE
PERFORM WITH TEST AFTER
UNTIL SQLCODE = 100
DISPLAY 'COMMITTING...' UPON CONSOLE
PERFORM COMMIT-UOW
DISPLAY 'DELETING 10000' UPON CONSOLE
PERFORM DB2-DELETE-TODAY
END-PERFORM
PERFORM COMMIT-UOW
DISPLAY 'DELETE PROCESS ENDS' UPON CONSOLE
.

DB2-DELETE-TODAY SECTION.
EXEC SQL
DELETE FROM (
SELECT UPDATE_DATE
FROM FILM.FILM_TRANSACTIONS
WHERE UPDATE_DATE = FB_FUNC.TO_DATE(:CURR-DATE-JUL-PACKED)
FETCH FIRST 10000 ROWS ONLY
)
WHERE 1 = 1
END-EXEC
CALL CHECKERR USING SQLCA ERRLOC
.

My question is, is this the way to go or is there some better way?

I tried making the "10000" a host variable, but that didn't work. Any way
around this?

You may wondering why I put the "WHERE 1 = 1" clause on the DELETE
statement. This is because DB2 gives a warning if you pre-compile a
DELETE
or UPDATE statement without a WHERE clause. Still works, but I like to
avoid warnings.

Thanks!
Frank
---
Frank Swarbrick
Senior Developer/Analyst - Mainframe Applications
FirstBank Data Corporation - Lakewood, CO USA

--
-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-
| In A World Without Fences Who Needs Gates? |
| Experience Linux. |
-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-

Jan 26 '07 #22

andyhe

What would be nice to have in this respect is an option for the DELETE

statement to explicitly turn off logging - which would have a certain
amount of problems as I just mentioned. Truncating a whole table is
supported that way already: you can use ALTER TABLE ... ACTIVATE NOT LOGGED
INITIALLY WITH EMPTY TABLE for that. Maybe this, combined with range
partitioning is an option for you?

What 'activate not logged initially' does (without the 'with empty
table' option) is that it suspends logging until the next commit. So
you CAN use it without partitioning. Just activate not logged
initially, do your delete and commit.
Beware of any errors (already mentioned)! If for some reason your
delete fails, db2 cannot roll back and the table becomes invalid. Also,
if you have to rollforward after a restore, there is no way to
rollforward over a 'not logged' transaction, your table will be invalid
after rollforward.
We have used this feature for a one-time extension of a very big table,
but took precautions before doing it: backup AND snapshot. :-)
By the way: emptying a complete table is very easy with an import from
/dev/null (on unix).

Jan 26 '07 #23

Frank Swarbrick

Good thought, but with two years of data, of 100,000 to 1,000,000 records a
day I don't think this would be very efficient! :-)
(This is a transaction history file with two years of transactions.)

Thanks,

Frank

---
Frank Swarbrick
Senior Developer/Analyst - Mainframe Applications
FirstBank Data Corporation - Lakewood, CO USA

>>Gregor Kova(Bè<gr**********@mikropis.si01/26/07 12:21 AM >>>

Hi!

I'd propose another method that involves EXPORT and a LOAD operation:
- first you EXPORT the rows you want to preserve
- then do a LOAD with REPLACE
Example:
1.) EXPORT FROM SCHEMA.TABLE1 INTO TABLE1.IXF OF IXF SELECT * FROM
SCHEMA.TABLE1 WHERE DATE BETWEEN CURRENT DATE AND CURRENT DATE - 3 DAYS
2.) LOAD FROM TABLE1.IXF OF IXF REPLACE INTO SCHEMA.TABLE1
After running this you'll probably have to do some SET INTEGRITY statements
if you have RI.

I only know DB2 for LUW, but if your ADMIN_CMD procedure supports EXPORT
and
LOAD oparations then you could even do this via SQL procedure.

Best regards,
Kovi

Jan 26 '07 #24

Art S. Kagel

Frank Swarbrick wrote:

Knut Stolze<st****@de.ibm.com01/25/07 1:38 AM >>>

>>Frank Swarbrick wrote:

>>>Mark A<no****@nowhere.com01/24/07 4:36 PM >>>

Yes there is a better way that will avoid filling up your DB2 z/OS logs.

Actually, DB2/LUW, but I'm guessing your advice still applies.

In that case, have you considered the MERGE statement? Then you may not
have to DELETE the rows at all - just UPDATE them.

Yet another thing I am not familiar with. I will look into it. Thanks.

Just updating the non-key columns in each existing row (and inserting the
missing row if the record was not in the original change set) will be FAR
more efficient. The delete-then-add operations must update each index key
twice - once to remove the original record's keys and again to readd the
row's keys. This is very expensive.

If you can determine whether all (or at least most) operations will be
replacing an existing row, and apparently you can, then just do the updates
in place. My testing has found that this is ALWAYS faster to update than to
delete-then-add and is faster than trying to insert the row and updating
instead if the update fails for a unique key violation (ie already exists)
when fewer than about 30% of the rows will have already pre-existed. When
more than about 70% of the rows do not exist already it is usually faster to
do the insert and update if the insert fails (the exact cutoff depends on
the number of indexes, whether the update modifies indexed columns, the
order the database engine chooses to check the unique keys for violations,
etc.).

Obviously YMMV so you need to test it yourself.

Art S. Kagel

Jan 29 '07 #25

deleting large numbers of records

Similar topics