473,372 Members | 921 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,372 software developers and data experts.

Improving DELETE performance for a large number of rows

Hello,

Environment: DB2 LUW v8 FP15 / Linux

I have a table with 50+ Million rows. The table structure is basically
(ID - Timestamp). I have two main applications - one inserting rows,
and the other reading/deleting rows.

The 'deleter' application runs a MIN/MAX (timestamp) for each ID and,
if the difference between min/max is greater than 1h, it reads all
rows, summarizes data, then start the deletion process.

We have developed C++ procedures that run on the DB server and
summarize data. The performance of C++ procedures reading data is very
good. However, all DELETE statements are sent from the remote
application server, and sometimes performance is very poor. The
application is coded to break the delete statement in chunks, in such
a way that a delete statement would, in average, never delete more
than 2k rows.

I am trying to find a way to improve my overall DELETE performance. I
was reading another thread that discussed deleting large number of
rows, and I was wondering if deleting rows from inside C++ procedure
would perform faster than sending statements from a remote
application.

Thanks in advance.

Sep 28 '07 #1
3 12285
"Michel Esber" <mi****@us.automatos.comwrote in message
news:11*********************@19g2000hsx.googlegrou ps.com...
Hello,

Environment: DB2 LUW v8 FP15 / Linux

I have a table with 50+ Million rows. The table structure is basically
(ID - Timestamp). I have two main applications - one inserting rows,
and the other reading/deleting rows.

The 'deleter' application runs a MIN/MAX (timestamp) for each ID and,
if the difference between min/max is greater than 1h, it reads all
rows, summarizes data, then start the deletion process.

We have developed C++ procedures that run on the DB server and
summarize data. The performance of C++ procedures reading data is very
good. However, all DELETE statements are sent from the remote
application server, and sometimes performance is very poor. The
application is coded to break the delete statement in chunks, in such
a way that a delete statement would, in average, never delete more
than 2k rows.

I am trying to find a way to improve my overall DELETE performance. I
was reading another thread that discussed deleting large number of
rows, and I was wondering if deleting rows from inside C++ procedure
would perform faster than sending statements from a remote
application.

Thanks in advance.
A simple SQL stored procedure would work fine. You should commit every 2000
rows or more often.
Sep 29 '07 #2
On 28 set, 21:36, "Mark A" <nob...@nowhere.comwrote:
"Michel Esber" <mic...@us.automatos.comwrote in message

news:11*********************@19g2000hsx.googlegrou ps.com...


Hello,
Environment: DB2 LUW v8 FP15 / Linux
I have a table with 50+ Million rows. The table structure is basically
(ID - Timestamp). I have two main applications - one inserting rows,
and the other reading/deleting rows.
The 'deleter' application runs a MIN/MAX (timestamp) for each ID and,
if the difference between min/max is greater than 1h, it reads all
rows, summarizes data, then start the deletion process.
We have developed C++ procedures that run on the DB server and
summarize data. The performance of C++ procedures reading data is very
good. However, all DELETE statements are sent from the remote
application server, and sometimes performance is very poor. The
application is coded to break the delete statement in chunks, in such
a way that a delete statement would, in average, never delete more
than 2k rows.
I am trying to find a way to improve my overall DELETE performance. I
was reading another thread that discussed deleting large number of
rows, and I was wondering if deleting rows from inside C++ procedure
would perform faster than sending statements from a remote
application.
Thanks in advance.

A simple SQL stored procedure would work fine. You should commit every 2000
rows or more often.

Thanks Mark.

What would perform better ?

- delete from table where ID = ? and timestamp between ? and ?

or

- Use cursors to delete data (chunks of 2k rows).

In both cases, no other application reads the same set of rows, so
concurrency is a minor issue.

Thanks, Michel

Sep 29 '07 #3
On Sep 29, 6:23 pm, Michel Esber <mic...@us.automatos.comwrote:
On 28 set, 21:36, "Mark A" <nob...@nowhere.comwrote:
"Michel Esber" <mic...@us.automatos.comwrote in message
news:11*********************@19g2000hsx.googlegrou ps.com...
Hello,
Environment: DB2 LUW v8 FP15 / Linux
I have a table with 50+ Million rows. The table structure is basically
(ID - Timestamp). I have two main applications - one inserting rows,
and the other reading/deleting rows.
The 'deleter' application runs a MIN/MAX (timestamp) for each ID and,
if the difference between min/max is greater than 1h, it reads all
rows, summarizes data, then start the deletion process.
We have developed C++ procedures that run on the DB server and
summarize data. The performance of C++ procedures reading data is very
good. However, all DELETE statements are sent from the remote
application server, and sometimes performance is very poor. The
application is coded to break the delete statement in chunks, in such
a way that a delete statement would, in average, never delete more
than 2k rows.
I am trying to find a way to improve my overall DELETE performance. I
was reading another thread that discussed deleting large number of
rows, and I was wondering if deleting rows from inside C++ procedure
would perform faster than sending statements from a remote
application.
Thanks in advance.
A simple SQL stored procedure would work fine. You should commit every 2000
rows or more often.

Thanks Mark.

What would perform better ?

- delete from table where ID = ? and timestamp between ? and ?

or

- Use cursors to delete data (chunks of 2k rows).

In both cases, no other application reads the same set of rows, so
concurrency is a minor issue.

Thanks, Michel
One reason why Mark has suggested committing every 2k or so rows is to
make sure that your logs don't get full. There is no hard bound rule
for when to commit, you may commit every 100k if your logspace is big
enough.

Performance wise, there won't be much of a difference between doing
the same job through a SP or a dynamic statement, unless the SP was
compiled long back and table statistics has changed a whole lot since
then.

One dumb (but simple) way to achieve deleting a subset of data every
time using a dynamic delete would be to use

" delete from table_name where primary_key in (select primary_key from
table_name where timestamp between ? and ? fetch first 10000 rows
only) "

Of course you can have better control in case of SP. Rather than
writing C++ (or Java) procedures, you might find SQL SP simpler and
easier to write. Take your pick.

Cheers,
Sanjuro

Oct 1 '07 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: Gert Schumann | last post by:
I want to delete just parts of tables, so I can't use 'TRUNCATE'. As I want to delete about millions of lines, I need a very big rollback segment. The best way would be to delete without using...
3
by: Andy Tran | last post by:
I built a system using mysql innodb to archive SMS messages but the innodb databases are not keeping up with the number of SMS messages coming in. I'm looking for performance of 200 msgs/sec where...
5
by: Bernie | last post by:
Greetings, I have 3 servers all running SQL Server 2000 - 8.00.818. Lets call them parent, child1, and child 2. On parent, I create a view called item as follows: CREATE view Item as...
16
by: Philip Boonzaaier | last post by:
I want to be able to generate SQL statements that will go through a list of data, effectively row by row, enquire on the database if this exists in the selected table- If it exists, then the colums...
14
by: Ron Johnson | last post by:
Hi, While on the topic of "need for in-place upgrades", I got to think- ing how the pg_restore could be speeded up. Am I wrong in saying that in the current pg_restore, all of the indexes are...
57
by: Bing Wu | last post by:
Hi all, I am running a database containing large datasets: frames: 20 thousand rows, coordinates: 170 million row. The database has been implemented with: IBM DB2 v8.1
16
by: robert | last post by:
been ruminating on the question (mostly in a 390/v7 context) of whether, and if so when, a row update becomes an insert/delete. i assume that there is a threshold on the number of columns of the...
2
by: NoSpam | last post by:
Hi, I am working with C# and ASP.NET with code behind and a SQL Server. I'm making an e-shop. When clients see what they have in their basket, I added a function DELETE to delete a line. It took...
1
by: David | last post by:
I have a VB.NET 1.1 app that must open/reopen and read hundreds (or sometimes thousands) of large, multi-GB files and create one extremely large output file (this is seismic data). I'm using the...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...
0
by: ryjfgjl | last post by:
In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.