By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
425,478 Members | 1,374 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 425,478 IT Pros & Developers. It's quick & easy.

Improving DELETE performance for a large number of rows

P: n/a
Hello,

Environment: DB2 LUW v8 FP15 / Linux

I have a table with 50+ Million rows. The table structure is basically
(ID - Timestamp). I have two main applications - one inserting rows,
and the other reading/deleting rows.

The 'deleter' application runs a MIN/MAX (timestamp) for each ID and,
if the difference between min/max is greater than 1h, it reads all
rows, summarizes data, then start the deletion process.

We have developed C++ procedures that run on the DB server and
summarize data. The performance of C++ procedures reading data is very
good. However, all DELETE statements are sent from the remote
application server, and sometimes performance is very poor. The
application is coded to break the delete statement in chunks, in such
a way that a delete statement would, in average, never delete more
than 2k rows.

I am trying to find a way to improve my overall DELETE performance. I
was reading another thread that discussed deleting large number of
rows, and I was wondering if deleting rows from inside C++ procedure
would perform faster than sending statements from a remote
application.

Thanks in advance.

Sep 28 '07 #1
Share this Question
Share on Google+
3 Replies


P: n/a
"Michel Esber" <mi****@us.automatos.comwrote in message
news:11*********************@19g2000hsx.googlegrou ps.com...
Hello,

Environment: DB2 LUW v8 FP15 / Linux

I have a table with 50+ Million rows. The table structure is basically
(ID - Timestamp). I have two main applications - one inserting rows,
and the other reading/deleting rows.

The 'deleter' application runs a MIN/MAX (timestamp) for each ID and,
if the difference between min/max is greater than 1h, it reads all
rows, summarizes data, then start the deletion process.

We have developed C++ procedures that run on the DB server and
summarize data. The performance of C++ procedures reading data is very
good. However, all DELETE statements are sent from the remote
application server, and sometimes performance is very poor. The
application is coded to break the delete statement in chunks, in such
a way that a delete statement would, in average, never delete more
than 2k rows.

I am trying to find a way to improve my overall DELETE performance. I
was reading another thread that discussed deleting large number of
rows, and I was wondering if deleting rows from inside C++ procedure
would perform faster than sending statements from a remote
application.

Thanks in advance.
A simple SQL stored procedure would work fine. You should commit every 2000
rows or more often.
Sep 29 '07 #2

P: n/a
On 28 set, 21:36, "Mark A" <nob...@nowhere.comwrote:
"Michel Esber" <mic...@us.automatos.comwrote in message

news:11*********************@19g2000hsx.googlegrou ps.com...


Hello,
Environment: DB2 LUW v8 FP15 / Linux
I have a table with 50+ Million rows. The table structure is basically
(ID - Timestamp). I have two main applications - one inserting rows,
and the other reading/deleting rows.
The 'deleter' application runs a MIN/MAX (timestamp) for each ID and,
if the difference between min/max is greater than 1h, it reads all
rows, summarizes data, then start the deletion process.
We have developed C++ procedures that run on the DB server and
summarize data. The performance of C++ procedures reading data is very
good. However, all DELETE statements are sent from the remote
application server, and sometimes performance is very poor. The
application is coded to break the delete statement in chunks, in such
a way that a delete statement would, in average, never delete more
than 2k rows.
I am trying to find a way to improve my overall DELETE performance. I
was reading another thread that discussed deleting large number of
rows, and I was wondering if deleting rows from inside C++ procedure
would perform faster than sending statements from a remote
application.
Thanks in advance.

A simple SQL stored procedure would work fine. You should commit every 2000
rows or more often.

Thanks Mark.

What would perform better ?

- delete from table where ID = ? and timestamp between ? and ?

or

- Use cursors to delete data (chunks of 2k rows).

In both cases, no other application reads the same set of rows, so
concurrency is a minor issue.

Thanks, Michel

Sep 29 '07 #3

P: n/a
On Sep 29, 6:23 pm, Michel Esber <mic...@us.automatos.comwrote:
On 28 set, 21:36, "Mark A" <nob...@nowhere.comwrote:
"Michel Esber" <mic...@us.automatos.comwrote in message
news:11*********************@19g2000hsx.googlegrou ps.com...
Hello,
Environment: DB2 LUW v8 FP15 / Linux
I have a table with 50+ Million rows. The table structure is basically
(ID - Timestamp). I have two main applications - one inserting rows,
and the other reading/deleting rows.
The 'deleter' application runs a MIN/MAX (timestamp) for each ID and,
if the difference between min/max is greater than 1h, it reads all
rows, summarizes data, then start the deletion process.
We have developed C++ procedures that run on the DB server and
summarize data. The performance of C++ procedures reading data is very
good. However, all DELETE statements are sent from the remote
application server, and sometimes performance is very poor. The
application is coded to break the delete statement in chunks, in such
a way that a delete statement would, in average, never delete more
than 2k rows.
I am trying to find a way to improve my overall DELETE performance. I
was reading another thread that discussed deleting large number of
rows, and I was wondering if deleting rows from inside C++ procedure
would perform faster than sending statements from a remote
application.
Thanks in advance.
A simple SQL stored procedure would work fine. You should commit every 2000
rows or more often.

Thanks Mark.

What would perform better ?

- delete from table where ID = ? and timestamp between ? and ?

or

- Use cursors to delete data (chunks of 2k rows).

In both cases, no other application reads the same set of rows, so
concurrency is a minor issue.

Thanks, Michel
One reason why Mark has suggested committing every 2k or so rows is to
make sure that your logs don't get full. There is no hard bound rule
for when to commit, you may commit every 100k if your logspace is big
enough.

Performance wise, there won't be much of a difference between doing
the same job through a SP or a dynamic statement, unless the SP was
compiled long back and table statistics has changed a whole lot since
then.

One dumb (but simple) way to achieve deleting a subset of data every
time using a dynamic delete would be to use

" delete from table_name where primary_key in (select primary_key from
table_name where timestamp between ? and ? fetch first 10000 rows
only) "

Of course you can have better control in case of SP. Rather than
writing C++ (or Java) procedures, you might find SQL SP simpler and
easier to write. Take your pick.

Cheers,
Sanjuro

Oct 1 '07 #4

This discussion thread is closed

Replies have been disabled for this discussion.