473,591 Members | 2,842 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

deleting large numbers of records

We have a batch process that inserts large numbers (100,000 - 1,000,000) of
records into a database each day. (DL/I database.) We're considering
converting it to a DB2 table. Currently we have logic in place that, prior
to inserting any data, reads the first input record and checks to see if it
already exists in the table. If the record already exists there are two
options:
1) Don't continue, because you already ran this job today!
2) This is a rerun - continue.

If number 2 is selected the first thing that happens is that it deletes
every record that was inserted today prior to doing the regular insert
process. (You may ask, why not just skip over the ones that are already
there. It's because we may be rerunning with an updated input file, where
the input records may be different than during the first run.)

Anyway, I figured with DB2 this would be a snap. I'll I'd need to do is:
EXEC SQL
DELETE FROM FILM.FILM_TRANS ACTIONS
WHERE UPDATE_DATE = FB_FUNC.TO_DATE (:CURR-DATE-JUL-PACKED)
END-EXEC

The only problem is that my log file would end up running out of room. So
now I've come up with the following:

DELETE-TODAY SECTION.
DISPLAY 'DELETE PROCESS BEGINS' UPON CONSOLE
PERFORM WITH TEST AFTER
UNTIL SQLCODE = 100
DISPLAY 'COMMITTING...' UPON CONSOLE
PERFORM COMMIT-UOW
DISPLAY 'DELETING 10000' UPON CONSOLE
PERFORM DB2-DELETE-TODAY
END-PERFORM
PERFORM COMMIT-UOW
DISPLAY 'DELETE PROCESS ENDS' UPON CONSOLE
Jan 24 '07 #1
24 21563
"Frank Swarbrick" <Fr************ *@efirstbank.co mwrote in message
news:51******** *****@mid.indiv idual.net...
We have a batch process that inserts large numbers (100,000 - 1,000,000)
of
records into a database each day. (DL/I database.) We're considering
converting it to a DB2 table. Currently we have logic in place that,
prior
to inserting any data, reads the first input record and checks to see if
it
already exists in the table. If the record already exists there are two
options:
1) Don't continue, because you already ran this job today!
2) This is a rerun - continue.

If number 2 is selected the first thing that happens is that it deletes
every record that was inserted today prior to doing the regular insert
process. (You may ask, why not just skip over the ones that are already
there. It's because we may be rerunning with an updated input file, where
the input records may be different than during the first run.)

Anyway, I figured with DB2 this would be a snap. I'll I'd need to do is:
EXEC SQL
DELETE FROM FILM.FILM_TRANS ACTIONS
WHERE UPDATE_DATE = FB_FUNC.TO_DATE (:CURR-DATE-JUL-PACKED)
END-EXEC

The only problem is that my log file would end up running out of room. So
now I've come up with the following:

DELETE-TODAY SECTION.
DISPLAY 'DELETE PROCESS BEGINS' UPON CONSOLE
PERFORM WITH TEST AFTER
UNTIL SQLCODE = 100
DISPLAY 'COMMITTING...' UPON CONSOLE
PERFORM COMMIT-UOW
DISPLAY 'DELETING 10000' UPON CONSOLE
PERFORM DB2-DELETE-TODAY
END-PERFORM
PERFORM COMMIT-UOW
DISPLAY 'DELETE PROCESS ENDS' UPON CONSOLE
.

DB2-DELETE-TODAY SECTION.
EXEC SQL
DELETE FROM (
SELECT UPDATE_DATE
FROM FILM.FILM_TRANS ACTIONS
WHERE UPDATE_DATE = FB_FUNC.TO_DATE (:CURR-DATE-JUL-PACKED)
FETCH FIRST 10000 ROWS ONLY
)
WHERE 1 = 1
END-EXEC
CALL CHECKERR USING SQLCA ERRLOC
.

My question is, is this the way to go or is there some better way?

I tried making the "10000" a host variable, but that didn't work. Any way
around this?

You may wondering why I put the "WHERE 1 = 1" clause on the DELETE
statement. This is because DB2 gives a warning if you pre-compile a
DELETE
or UPDATE statement without a WHERE clause. Still works, but I like to
avoid warnings.

Thanks!
Frank
---
Frank Swarbrick
Senior Developer/Analyst - Mainframe Applications
FirstBank Data Corporation - Lakewood, CO USA
Yes there is a better way that will avoid filling up your DB2 z/OS logs.

Declare a cursor with something like:
SELECT UPDATE_DATE FROM FILM.FILM_TRANS ACTIONS
WHERE UPDATE_DATE = FB_FUNC.TO_DATE (:CURR-DATE-JUL-PACKED) FOR
UPDATE WITH HOLD

OPEN cursor-name.

Then FETCH one row at a time (until RC = 100).

DELETE WHERE CURRENT OF C1 (cursor name).

COMMIT. You could commit every 100 - 10000 if you don't want to commit each
row.

CLOSE cursor-name.

Note that the WITH HOLD will retain the cursor position even though you have
committed (which usually closes a cursor).

Please consult the SQL Reference for details, because I am posting all the
above syntax from (my) memory.

Also, you find that there is very little increase in performance in
committing more often than every 100 updates. Committing more often will
provide increased concurrency with other applications.
Jan 24 '07 #2
"Mark A" <no****@nowhere .comwrote in message
news:pO******** *************** *******@comcast .com...
Also, you find that there is very little increase in performance in
committing more often than every 100 updates. Committing more often will
provide increased concurrency with other applications.
Correction, you find that there is very little increase in performance in
committing LESS often than every 100 updates.
Jan 24 '07 #3
I think both solutions have merit. Frank, your version is very much
like the approach outlined in Serge's "SQL on Fire" seminars, albeit in
COBOL as opposed to SQL/PL. Mark's solution will use less log space
than yours, but I respectfully wonder about the time required to delete
all of the target rows when they're being fetched and deleted
one-at-a-time.

FWIW, here's how I've done it (in a stored procedure):

CREATE PROCEDURE CSE.PURGE_LRD()
BEGIN

DECLARE V_NO_DATA SMALLINT DEFAULT 0;--
DECLARE V_DEADLOCK_OR_L TO SMALLINT DEFAULT 0;--

DECLARE C_DEADLOCK_OR_L TO CONDITION FOR SQLSTATE '40001';--

DECLARE CONTINUE HANDLER FOR NOT FOUND
SET V_NO_DATA = 1;--

-- The V_DEADLOCK_OR_L TO attribute is throw-away,
-- but a continue handler needs to do something,
-- i.e., it's not enough to just declare a handler,
-- it has to have an action in its body.
DECLARE CONTINUE HANDLER FOR C_DEADLOCK_OR_L TO
SET V_DEADLOCK_OR_L TO = 1;--

WHILE (V_NO_DATA = 0) DO
DELETE FROM
(
SELECT
1
FROM
LRD
FETCH FIRST 200000 ROWS ONLY
) AS LRD_D;--

COMMIT;--
END WHILE;--
END;

--Jeff

On Jan 24, 3:44 pm, "Mark A" <nob...@nowhere .comwrote:
"Mark A" <nob...@nowhere .comwrote in messagenews:pO* *************** **************@ comcast.com...
Also, you find that there is very little increase in performance in
committing more often than every 100 updates. Committing more often will
provide increased concurrency with other applications.Co rrection, you find that there is very little increase in performance in
committing LESS often than every 100 updates.
Jan 25 '07 #4
Mark A<no****@nowher e.com01/24/07 4:36 PM >>>
>Yes there is a better way that will avoid filling up your DB2 z/OS logs.
Actually, DB2/LUW, but I'm guessing your advice still applies.
>Declare a cursor with something like:
SELECT UPDATE_DATE FROM FILM.FILM_TRANS ACTIONS
WHERE UPDATE_DATE = FB_FUNC.TO_DATE (:CURR-DATE-JUL-PACKED) FOR
UPDATE WITH HOLD

OPEN cursor-name.

Then FETCH one row at a time (until RC = 100).

DELETE WHERE CURRENT OF C1 (cursor name).

COMMIT. You could commit every 100 - 10000 if you don't want to commit each
>row.

CLOSE cursor-name.
Interesting. I just figured that this would be much less efficient than
doing just the delete with the fullselect, because in the latter case no
data need be returned to the AR. Anyway, I will give it a shot.
>Note that the WITH HOLD will retain the cursor position even though you
have
>committed (which usually closes a cursor).

Please consult the SQL Reference for details, because I am posting all the
>above syntax from (my) memory.

Also, you find that there is very little increase in performance in
committing more often than every 100 updates. Committing more often will
provide increased concurrency with other applications.
Will keep that in mind. Thanks!

Frank

---
Frank Swarbrick
Senior Developer/Analyst - Mainframe Applications
FirstBank Data Corporation - Lakewood, CO USA
Jan 25 '07 #5
Oh! So every 10,000 actually does make sense then, right?
>>Mark A<no****@nowher e.com01/24/07 4:44 PM >>>
"Mark A" <no****@nowhere .comwrote in message
news:pO******** *************** *******@comcast .com...
Also, you find that there is very little increase in performance in
committing more often than every 100 updates. Committing more often will
provide increased concurrency with other applications.
Correction, you find that there is very little increase in performance in
committing LESS often than every 100 updates.


Jan 25 '07 #6
jefftyzzer<je** ******@sbcgloba l.net01/24/07 5:54 PM >>>
>I think both solutions have merit. Frank, your version is very much
like the approach outlined in Serge's "SQL on Fire" seminars, albeit in
COBOL as opposed to SQL/PL. Mark's solution will use less log space
than yours, but I respectfully wonder about the time required to delete
all of the target rows when they're being fetched and deleted
one-at-a-time.
That was my concern as well. In any case, I will give both methods a shot
and see which I prefer.
>FWIW, here's how I've done it (in a stored procedure):

CREATE PROCEDURE CSE.PURGE_LRD()
BEGIN

DECLARE V_NO_DATA SMALLINT DEFAULT 0;--
DECLARE V_DEADLOCK_OR_L TO SMALLINT DEFAULT 0;--

DECLARE C_DEADLOCK_OR_L TO CONDITION FOR SQLSTATE '40001';--

DECLARE CONTINUE HANDLER FOR NOT FOUND
SET V_NO_DATA = 1;--

-- The V_DEADLOCK_OR_L TO attribute is throw-away,
-- but a continue handler needs to do something,
-- i.e., it's not enough to just declare a handler,
-- it has to have an action in its body.
DECLARE CONTINUE HANDLER FOR C_DEADLOCK_OR_L TO
SET V_DEADLOCK_OR_L TO = 1;--

WHILE (V_NO_DATA = 0) DO
DELETE FROM
(
SELECT
1
FROM
LRD
FETCH FIRST 200000 ROWS ONLY
) AS LRD_D;--

COMMIT;--
END WHILE;--
END;
My probably very naive thought is that it would be nice to have some sort of
DELETE statement that didn't even do logging. While often (usually?) you
would want to be able to ROLLBACK a DELETE, in the case of what I'm doing
there's no reason I would ever want to rollback. So why log? Just
wondering...

Thanks!

Frank

---
Frank Swarbrick
Senior Developer/Analyst - Mainframe Applications
FirstBank Data Corporation - Lakewood, CO USA
Jan 25 '07 #7
"Frank Swarbrick" <Fr************ *@efirstbank.co mwrote in message
news:51******** *****@mid.indiv idual.net...
Oh! So every 10,000 actually does make sense then, right?
No. I would commit every 100 - 1000 rows. A commit is not that expensive,
unless you do it for every row. Doing it every 100 rows or 10,000 will
provide about a 2% difference in performance at best.

The major expense of a commit is a synchronous write of the log buffer to
disk. But the log buffer will be flushed to disk anyway whenever the log
buffer is full, or about every 1 second. In any case, a commit is usually
only as expensive as a few deletes, so do the math.
Jan 25 '07 #8
"Frank Swarbrick" <Fr************ *@efirstbank.co mwrote in message
news:51******** *****@mid.indiv idual.net...
Interesting. I just figured that this would be much less efficient than
doing just the delete with the fullselect, because in the latter case no
data need be returned to the AR. Anyway, I will give it a shot.
It depends on where the program runs. If the program runs on server and the
static SQL is bound into a package that runs on the server, then there is
not that much difference in performance (unless performance is
ultra-critical). If the program is running remotely, then there would be a
big difference in performance.

I have written SQL stored procedures to do mass deletes with a cursor and it
performs well.

I haven't seen to many COBOL programs running on DB2 LUW. What compiler are
you using? I used MicroFocus COBOL against OS/2 Database Manager, but that
was in 1991.
Jan 25 '07 #9
Frank Swarbrick wrote:
jefftyzzer<je** ******@sbcgloba l.net01/24/07 5:54 PM >>>
>>I think both solutions have merit. Frank, your version is very much
like the approach outlined in Serge's "SQL on Fire" seminars, albeit in
COBOL as opposed to SQL/PL. Mark's solution will use less log space
than yours, but I respectfully wonder about the time required to delete
all of the target rows when they're being fetched and deleted
one-at-a-time.

That was my concern as well. In any case, I will give both methods a shot
and see which I prefer.
>>FWIW, here's how I've done it (in a stored procedure):

CREATE PROCEDURE CSE.PURGE_LRD()
BEGIN

DECLARE V_NO_DATA SMALLINT DEFAULT 0;--
DECLARE V_DEADLOCK_OR_L TO SMALLINT DEFAULT 0;--

DECLARE C_DEADLOCK_OR_L TO CONDITION FOR SQLSTATE '40001';--

DECLARE CONTINUE HANDLER FOR NOT FOUND
SET V_NO_DATA = 1;--

-- The V_DEADLOCK_OR_L TO attribute is throw-away,
-- but a continue handler needs to do something,
-- i.e., it's not enough to just declare a handler,
-- it has to have an action in its body.
DECLARE CONTINUE HANDLER FOR C_DEADLOCK_OR_L TO
SET V_DEADLOCK_OR_L TO = 1;--

WHILE (V_NO_DATA = 0) DO
DELETE FROM
(
SELECT
1
FROM
LRD
FETCH FIRST 200000 ROWS ONLY
) AS LRD_D;--

COMMIT;--
END WHILE;--
END;

My probably very naive thought is that it would be nice to have some sort
of
DELETE statement that didn't even do logging. While often (usually?) you
would want to be able to ROLLBACK a DELETE, in the case of what I'm doing
there's no reason I would ever want to rollback. So why log? Just
wondering...
The logs are also used for crash recovery. Let's assume you run the
unlogged DELETE. Now your application or the DB2 server crashes before you
issued a COMMIT. Upon restart, DB2 has to make sure the transaction is
properly rolled back and the database is in a consistent state. If you
don't log the DELETE, you are out of luck there.

What would be nice to have in this respect is an option for the DELETE
statement to explicitly turn off logging - which would have a certain
amount of problems as I just mentioned. Truncating a whole table is
supported that way already: you can use ALTER TABLE ... ACTIVATE NOT LOGGED
INITIALLY WITH EMPTY TABLE for that. Maybe this, combined with range
partitioning is an option for you?

--
Knut Stolze
DB2 z/OS Utilities Development
IBM Germany
Jan 25 '07 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
6512
by: koray | last post by:
hi, i need to show large numbers seperated by commas. since i'm using variables from speedscript, i cannot know their values, since the user should enter them. how should i code to show these large numbers in comma(or pointed)-style?
3
5263
by: Alex Vinokur | last post by:
Dann Corbit has implemented function int ilog2 (unsigned long) at http://groups.google.com/groups?selm=lkPa4.2165%24I41.1498%40client Is exist a similar C++-function for very large numbers, e.g., function with signature vector<unsigned long> ilog2 (const vector<unsigned long>&)? -- Alex Vinokur email: alex DOT vinokur AT gmail DOT com
10
4982
by: Tuvas | last post by:
I've been thinking about writing a program to generate the world's largest prime numbers, just for the fun of it. This would require being able to hold an 8000000 digit number into memory (25 megabits, or a little over 3 megs of memory for just one variable...) I would also need several smaller variables. This came about as I optimised a prime number generator more and more, until I came with the idea to try to find the largest ever, using...
22
4461
by: Frinton | last post by:
Hi, I am trying to do some calculations on large numbers (ie 7,768,489,957,892,578,474,792,094 / 12,280) and no matter what I do it doesn't get it quite right. Its always somewhere between 10 and and 5000 out :( I have a suspition is could be down to one of the number functions I am using along the way but im not sure.
3
8198
by: CFonville | last post by:
I was wondering if there is any way to store large numbers in a variable? With this simple script: var bigpi = 1234567890123456789012345678901234567890123456789012345678901234567890; alert(bigpi); I only get the first 17 digits and an exponent. Is there any way to save very large numbers to a variable? I would really like to be able to manipulate pi to a large number of decimal places (say 1,000). Is
2
2784
JAMBAI
by: JAMBAI | last post by:
Hi, How to delete large numbers (100,000 - 1,000,000) records from linked table. I am trying to delete from MS Access Forms. Thanks Jambai
0
7934
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
7870
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
8236
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
8362
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
7992
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
8225
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
6639
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
3850
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
1
2378
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.