Improving DELETE performance for a large number of rows

Michel Esber

Hello,

Environment: DB2 LUW v8 FP15 / Linux

I have a table with 50+ Million rows. The table structure is basically
(ID - Timestamp). I have two main applications - one inserting rows,
and the other reading/deleting rows.

The 'deleter' application runs a MIN/MAX (timestamp) for each ID and,
if the difference between min/max is greater than 1h, it reads all
rows, summarizes data, then start the deletion process.

We have developed C++ procedures that run on the DB server and
summarize data. The performance of C++ procedures reading data is very
good. However, all DELETE statements are sent from the remote
application server, and sometimes performance is very poor. The
application is coded to break the delete statement in chunks, in such
a way that a delete statement would, in average, never delete more
than 2k rows.

I am trying to find a way to improve my overall DELETE performance. I
was reading another thread that discussed deleting large number of
rows, and I was wondering if deleting rows from inside C++ procedure
would perform faster than sending statements from a remote
application.

Thanks in advance.

Sep 28 '07 #1

Subscribe Reply

12313

Mark A

"Michel Esber" <mi****@us.auto matos.comwrote in message
news:11******** *************@1 9g2000hsx.googl egroups.com...

Hello,

Environment: DB2 LUW v8 FP15 / Linux

I have a table with 50+ Million rows. The table structure is basically
(ID - Timestamp). I have two main applications - one inserting rows,
and the other reading/deleting rows.

The 'deleter' application runs a MIN/MAX (timestamp) for each ID and,
if the difference between min/max is greater than 1h, it reads all
rows, summarizes data, then start the deletion process.

We have developed C++ procedures that run on the DB server and
summarize data. The performance of C++ procedures reading data is very
good. However, all DELETE statements are sent from the remote
application server, and sometimes performance is very poor. The
application is coded to break the delete statement in chunks, in such
a way that a delete statement would, in average, never delete more
than 2k rows.

I am trying to find a way to improve my overall DELETE performance. I
was reading another thread that discussed deleting large number of
rows, and I was wondering if deleting rows from inside C++ procedure
would perform faster than sending statements from a remote
application.

Thanks in advance.

A simple SQL stored procedure would work fine. You should commit every 2000
rows or more often.

Sep 29 '07 #2

Michel Esber

On 28 set, 21:36, "Mark A" <nob...@nowhere .comwrote:

"Michel Esber" <mic...@us.auto matos.comwrote in message

news:11******** *************@1 9g2000hsx.googl egroups.com...

Hello,

Environment: DB2 LUW v8 FP15 / Linux

I have a table with 50+ Million rows. The table structure is basically
(ID - Timestamp). I have two main applications - one inserting rows,
and the other reading/deleting rows.

The 'deleter' application runs a MIN/MAX (timestamp) for each ID and,
if the difference between min/max is greater than 1h, it reads all
rows, summarizes data, then start the deletion process.

We have developed C++ procedures that run on the DB server and
summarize data. The performance of C++ procedures reading data is very
good. However, all DELETE statements are sent from the remote
application server, and sometimes performance is very poor. The
application is coded to break the delete statement in chunks, in such
a way that a delete statement would, in average, never delete more
than 2k rows.

I am trying to find a way to improve my overall DELETE performance. I
was reading another thread that discussed deleting large number of
rows, and I was wondering if deleting rows from inside C++ procedure
would perform faster than sending statements from a remote
application.

Thanks in advance.

A simple SQL stored procedure would work fine. You should commit every 2000
rows or more often.

Thanks Mark.

What would perform better ?

- delete from table where ID = ? and timestamp between ? and ?

or

- Use cursors to delete data (chunks of 2k rows).

In both cases, no other application reads the same set of rows, so
concurrency is a minor issue.

Thanks, Michel

Sep 29 '07 #3

Sanjuro

On Sep 29, 6:23 pm, Michel Esber <mic...@us.auto matos.comwrote:

On 28 set, 21:36, "Mark A" <nob...@nowhere .comwrote:

"Michel Esber" <mic...@us.auto matos.comwrote in message

news:11******** *************@1 9g2000hsx.googl egroups.com...

Hello,

Environment: DB2 LUW v8 FP15 / Linux

I have a table with 50+ Million rows. The table structure is basically
(ID - Timestamp). I have two main applications - one inserting rows,
and the other reading/deleting rows.

The 'deleter' application runs a MIN/MAX (timestamp) for each ID and,
if the difference between min/max is greater than 1h, it reads all
rows, summarizes data, then start the deletion process.

We have developed C++ procedures that run on the DB server and
summarize data. The performance of C++ procedures reading data is very
good. However, all DELETE statements are sent from the remote
application server, and sometimes performance is very poor. The
application is coded to break the delete statement in chunks, in such
a way that a delete statement would, in average, never delete more
than 2k rows.

I am trying to find a way to improve my overall DELETE performance. I
was reading another thread that discussed deleting large number of
rows, and I was wondering if deleting rows from inside C++ procedure
would perform faster than sending statements from a remote
application.

Thanks in advance.

A simple SQL stored procedure would work fine. You should commit every 2000
rows or more often.

Thanks Mark.

What would perform better ?

- delete from table where ID = ? and timestamp between ? and ?

or

- Use cursors to delete data (chunks of 2k rows).

In both cases, no other application reads the same set of rows, so
concurrency is a minor issue.

Thanks, Michel

One reason why Mark has suggested committing every 2k or so rows is to
make sure that your logs don't get full. There is no hard bound rule
for when to commit, you may commit every 100k if your logspace is big
enough.

Performance wise, there won't be much of a difference between doing
the same job through a SP or a dynamic statement, unless the SP was
compiled long back and table statistics has changed a whole lot since
then.

One dumb (but simple) way to achieve deleting a subset of data every
time using a dynamic delete would be to use

" delete from table_name where primary_key in (select primary_key from
table_name where timestamp between ? and ? fetch first 10000 rows
only) "

Of course you can have better control in case of SP. Rather than
writing C++ (or Java) procedures, you might find SQL SP simpler and
easier to write. Take your pick.

Cheers,
Sanjuro

Oct 1 '07 #4

Similar topics

11333

Is it possible to delete parts of a table without using rollback??

by: Gert Schumann | last post by:

I want to delete just parts of tables, so I can't use 'TRUNCATE'. As I want to delete about millions of lines, I need a very big rollback segment. The best way would be to delete without using rollback because the delete really takes a lot of time copying all data into the rollback file. Is this possible with oracle? Many thanks for any...

Oracle Database

949

Improving innodb performance

by: Andy Tran | last post by:

I built a system using mysql innodb to archive SMS messages but the innodb databases are not keeping up with the number of SMS messages coming in. I'm looking for performance of 200 msgs/sec where 1 msg is 1 database row. I'm running on Red Linux: 2.4.20-8bigmem #1 SMP Thu Mar 13 17:32:29 EST 2003 i686 i686 i386 GNU/Linux The machine...

MySQL Database

4477

View performance, linked servers, query specifiying uniqueidentifier

by: Bernie | last post by:

Greetings, I have 3 servers all running SQL Server 2000 - 8.00.818. Lets call them parent, child1, and child 2. On parent, I create a view called item as follows: CREATE view Item as select * from child1.dbchild1.dbo.Item union all select * from child2.DBChild2.dbo.Item

Microsoft SQL Server

16984

Bulk Insert / Update / Delete

by: Philip Boonzaaier | last post by:

I want to be able to generate SQL statements that will go through a list of data, effectively row by row, enquire on the database if this exists in the selected table- If it exists, then the colums must be UPDATED, if not, they must be INSERTED. Logically then, I would like to SELECT * FROM <TABLE> WHERE ....<Values entered here>, and then...

PostgreSQL Database

3681

Idea for improving speed of pg_restore

by: Ron Johnson | last post by:

Hi, While on the topic of "need for in-place upgrades", I got to think- ing how the pg_restore could be speeded up. Am I wrong in saying that in the current pg_restore, all of the indexes are created in serial? How about this new, multi-threaded way of doing the pg_restore: 0. On the command line, you specify how many threads you want.

PostgreSQL Database

25491

DB2 vs MySQL - performance on large tables

by: Bing Wu | last post by:

Hi all, I am running a database containing large datasets: frames: 20 thousand rows, coordinates: 170 million row. The database has been implemented with: IBM DB2 v8.1

DB2 Database

3856

update transmogrifies to insert/delete

by: robert | last post by:

been ruminating on the question (mostly in a 390/v7 context) of whether, and if so when, a row update becomes an insert/delete. i assume that there is a threshold on the number of columns of the table, or perhaps bytes, being updated where the engine just decides, screw it, i'll just make a new one. surfed this group and google, but...

DB2 Database

5060

C# delete code

by: NoSpam | last post by:

Hi, I am working with C# and ASP.NET with code behind and a SQL Server. I'm making an e-shop. When clients see what they have in their basket, I added a function DELETE to delete a line. It took me hours to get it working in both the dataset and the database itself. It works now, but the code looks so ugly to me. Can someone tell me what I...

ASP.NET

1281

Improving perf of .NET 1.1 app that opens hundreds of files...

by: David | last post by:

I have a VB.NET 1.1 app that must open/reopen and read hundreds (or sometimes thousands) of large, multi-GB files and create one extremely large output file (this is seismic data). I'm using the FileStream class for reading and writing. I need to improve its performance, so I first thought I would add my own app-level buffering on the...

.NET Framework

7664

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main...

General

7583

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language...

Windows Server

7948

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the...

General

5484

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes...

Microsoft Access / VBA

5213

Couldn’t get equations in html when convert word .docx file to html file in C#.

by: conductexam | last post by:

I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert...

C# / C Sharp

3626

Windows Forms - .Net 8.0

by: adsilva | last post by:

A Windows Forms form does not have the event Unload, like VB6. What one acts like?

Visual Basic .NET

2082

transfer the data from one system to another through ip address

by: 6302768590 | last post by:

Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system

C# / C Sharp

1198

How to add payments to a PHP MySQL app.

by: muto222 | last post by:

How can i add a mobile payment intergratation into php mysql website.

PHP

923

Comprehensive Guide to Website Development in Toronto: Expert Insights from BSMN Consultancy

by: bsmnconsultancy | last post by:

In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating...

General