By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
435,264 Members | 1,277 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 435,264 IT Pros & Developers. It's quick & easy.

Delete and concurrency problems

P: n/a
Hello,

Env: DB2 V8 LUW FP16 running Linux

create Table X (machine_id varchar(24) not null, ctime timestamp not
null);
create index iFoo on X (MACHINE_ID, CTIME) allow reverse scans;
alter table X add primary key (MACHINE_ID, CTIME);

Our C++ application inserts data into a table X using CLI array insert
with very good throughput.

On an hourly basis, another application (running on a different
server) reads the last 'hour':

select * from T where MACHINE_ID = ? and ctime between current
timestamp - 1 hour and current timestamp for read only;

After that, it starts a delete loop:

while (SQLCODE <100 )
do
delete from (select 1 from T where MACHINE_ID = ? and CTIME between
current timestamp - 1 hour and current timestamp fetch first 2000 rows
only) as B
done
The deletes have severe impact on my insert rate/throughput. I can see
number 3 times slower, or even worse. There are absolutely *NO* lock-
waits.

Question: Should I expect this performance degradation or am I missing
something ? Maybe there is a better technique to delete data.

Any thoughts ?

Thanks in advance,
Sep 18 '08 #1
Share this Question
Share on Google+
8 Replies


P: n/a
"Michel Esber" <mi****@us.automatos.comwrote in message
news:d3**********************************@v39g2000 pro.googlegroups.com...
Hello,

Env: DB2 V8 LUW FP16 running Linux

create Table X (machine_id varchar(24) not null, ctime timestamp not
null);
create index iFoo on X (MACHINE_ID, CTIME) allow reverse scans;
alter table X add primary key (MACHINE_ID, CTIME);

Our C++ application inserts data into a table X using CLI array insert
with very good throughput.

On an hourly basis, another application (running on a different
server) reads the last 'hour':

select * from T where MACHINE_ID = ? and ctime between current
timestamp - 1 hour and current timestamp for read only;

After that, it starts a delete loop:

while (SQLCODE <100 )
do
delete from (select 1 from T where MACHINE_ID = ? and CTIME between
current timestamp - 1 hour and current timestamp fetch first 2000 rows
only) as B
done
The deletes have severe impact on my insert rate/throughput. I can see
number 3 times slower, or even worse. There are absolutely *NO* lock-
waits.

Question: Should I expect this performance degradation or am I missing
something ? Maybe there is a better technique to delete data.

Any thoughts ?

Thanks in advance,
db2set DB2_SKIPDELETED=ON
db2set DB2_SKIPINSERTED=ON

then restart the instance (db2stop, db2start)
Sep 18 '08 #2

P: n/a
db2set DB2_SKIPDELETED=ON
db2set DB2_SKIPINSERTED=ON

then restart the instance (db2stop, db2start)- Ocultar texto entre aspas -
Hi Mark,

Thanks for your feedback. However, this variable was already set.
Here is my config:

DB2LINUXAIO=TRUE
DB2_SKIPINSERTED=ON
DB2_SCATTERED_IO=ON
DB2_USE_ALTERNATE_PAGE_CLEANING=ON
DB2_EVALUNCOMMITTED=ON
DB2_SKIPDELETED=ON
DB2COMM=tcpip
DB2_PARALLEL_IO=*
DB2AUTOSTART=NO

Maybe v9 has a better locking/concurrency mechanism ?

-M
Sep 19 '08 #3

P: n/a
I was reading Chris Eatonīs article:

http://it.toolbox.com/blogs/db2luw/s...statement-7239

And it seems like the use of select from delete may be useful here.

However, here is the problem:

My application has to read up to 500K rows, summarizes data, and then
it loops deleting data in chunks of 2k rows.
I do need to read ALL the rows within a range and only after that
start the deletion.

select * from OLD TABLE (delete from X where MACHINE_ID = ? and ctime
between ? and ? fetch first 2000 rows only) doesnīt seem to suit my
needs.

Any thoughts?

Thanks,
Sep 19 '08 #4

P: n/a
Michel Esber wrote:
I was reading Chris Eatonīs article:

http://it.toolbox.com/blogs/db2luw/s...statement-7239

And it seems like the use of select from delete may be useful here.

However, here is the problem:

My application has to read up to 500K rows, summarizes data, and then
it loops deleting data in chunks of 2k rows.
I do need to read ALL the rows within a range and only after that
start the deletion.

select * from OLD TABLE (delete from X where MACHINE_ID = ? and ctime
between ? and ? fetch first 2000 rows only) doesnīt seem to suit my
needs.
General question: How does teh plan look like? Are you deleting by index
or does DB2 scan?

Cheers
Serge

--
Serge Rielau
DB2 Solutions Development
IBM Toronto Lab
Sep 19 '08 #5

P: n/a
General question: How does teh plan look like? Are you deleting by index
or does DB2 scan?
Hello Serge,

It is deleting by index. Here is the full plan:

******************** EXPLAIN INSTANCE ********************

DB2_VERSION: 08.02.8
SOURCE_NAME: SQLC2E07
SOURCE_SCHEMA: NULLID
SOURCE_VERSION:
EXPLAIN_TIME: 2008-09-19-11.05.35.744822
EXPLAIN_REQUESTER: DB2INST1

Database Context:
----------------
Parallelism: None
CPU Speed: 4.802167e-07
Comm Speed: 0
Buffer Pool size: 189000
Sort Heap size: 2048
Database Heap size: 16000
Lock List size: 8192
Maximum Lock List: 20
Average Applications: 50
Locks Available: 167116

Package Context:
---------------
SQL Type: Dynamic
Optimization Level: 5
Blocking: Block All Cursors
Isolation Level: Cursor Stability

---------------- STATEMENT 1 SECTION 203 ----------------
QUERYNO: 1
QUERYTAG:
Statement Type: Searched Delete
Updatable: Not Applicable
Deletable: Not Applicable
Query Degree: 1

Original Statement:
------------------
DELETE
FROM RTM.TBL_COLLECT_PSSTAT_WIN_RTM
where MACHINE_ID = ? and collect_time between ? and ?
Optimized Statement:
-------------------
DELETE
FROM RTM.TBL_COLLECT_PSSTAT_WIN_RTM AS Q1
WHERE $RID$ IN
(SELECT $RID$
FROM RTM.TBL_COLLECT_PSSTAT_WIN_RTM AS Q2
WHERE (:? <= :?) AND (Q2.COLLECT_TIME <= :?) AND (:? <=
Q2.COLLECT_TIME)
AND (Q2.MACHINE_ID = :?))

Access Plan:
-----------
Total Cost: 73.0298
Query Degree: 1

Rows
RETURN
( 1)
Cost
I/O
|
8.39196
DELETE
( 2)
73.0298
11.392
/------+------\
8.39196 2.918e+06
IXSCAN TABLE: RTM
( 3) TBL_COLLECT_PSSTAT_WIN_RTM
19.2689
3
|
2.918e+06
INDEX: DB2INST1
IX_PS_WIN


Extended Diagnostic Information:
--------------------------------

No extended Diagnostic Information for this statment.
Plan Details:
-------------
1) RETURN: (Return Result)
Cumulative Total Cost: 73.0298
Cumulative CPU Cost: 252522
Cumulative I/O Cost: 11.392
Cumulative Re-Total Cost: 53.794
Cumulative Re-CPU Cost: 178061
Cumulative Re-I/O Cost: 8.39196
Cumulative First Row Cost: 73.0298
Estimated Bufferpool Buffers: 12.392

Arguments:
---------
BLDLEVEL: (Build level)
DB2 v8.1.2.136 : special_19546
HEAPUSE : (Maximum Statement Heap Usage)
56 Pages
STMTHEAP: (Statement heap size)
4096

Input Streams:
-------------
4) From Operator #2

Estimated number of rows:
8.39196
Number of columns: 0
Subquery predicate ID: Not
Applicable
2) DELETE: (Delete)
Cumulative Total Cost: 73.0298
Cumulative CPU Cost: 252522
Cumulative I/O Cost: 11.392
Cumulative Re-Total Cost: 53.794
Cumulative Re-CPU Cost: 178061
Cumulative Re-I/O Cost: 8.39196
Cumulative First Row Cost: 73.0298
Estimated Bufferpool Buffers: 12.392

Input Streams:
-------------
2) From Operator #3

Estimated number of rows:
8.39196
Number of columns: 2
Subquery predicate ID: Not
Applicable

Column Names:
------------
+Q3.$C0+Q3.$C1
Output Streams:
--------------
3) To Object RTM.TBL_COLLECT_PSSTAT_WIN_RTM

Estimated number of rows: 2.918e
+06
Number of columns: 1
Subquery predicate ID: Not
Applicable

Column Names:
------------
+Q1.$RID$

4) To Operator #1

Estimated number of rows:
8.39196
Number of columns: 0
Subquery predicate ID: Not
Applicable
3) IXSCAN: (Index Scan)
Cumulative Total Cost: 19.2689
Cumulative CPU Cost: 143426
Cumulative I/O Cost: 3
Cumulative Re-Total Cost: 0.0331183
Cumulative Re-CPU Cost: 68965.2
Cumulative Re-I/O Cost: 0
Cumulative First Row Cost: 19.245
Estimated Bufferpool Buffers: 4

Arguments:
---------
MAXPAGES: (Maximum pages for prefetch)
1
PREFETCH: (Type of Prefetch)
NONE
ROWLOCK : (Row Lock intent)
UPDATE
SCANDIR : (Scan Direction)
FORWARD
TABLOCK : (Table Lock intent)
INTENT EXCLUSIVE

Predicates:
----------
2) Sargable Predicate
Relational Operator: Less Than or
Equal (<=)
Subquery Input Required: No
Filter Factor: 0.333333

Predicate Text:
--------------
(:? <= :?)

3) Stop Key Predicate
Relational Operator: Less Than or
Equal (<=)
Subquery Input Required: No
Filter Factor: 0.0200165

Predicate Text:
--------------
(Q2.COLLECT_TIME <= :?)

4) Start Key Predicate
Relational Operator: Less Than or
Equal (<=)
Subquery Input Required: No
Filter Factor: 0.0200165

Predicate Text:
--------------
(:? <= Q2.COLLECT_TIME)

5) Start Key Predicate
Relational Operator: Equal (=)
Subquery Input Required: No
Filter Factor: 0.00143678

Predicate Text:
--------------
(Q2.MACHINE_ID = :?)

5) Stop Key Predicate
Relational Operator: Equal (=)
Subquery Input Required: No
Filter Factor: 0.00143678

Predicate Text:
--------------
(Q2.MACHINE_ID = :?)
Input Streams:
-------------
1) From Object DB2INST1.IX_PS_WIN

Estimated number of rows: 2.918e
+06
Number of columns: 3
Subquery predicate ID: Not
Applicable

Column Names:
------------
+Q2.$RID$+Q2.MACHINE_ID
+Q2.COLLECT_TIME
Output Streams:
--------------
2) To Operator #2

Estimated number of rows:
8.39196
Number of columns: 2
Subquery predicate ID: Not
Applicable

Column Names:
------------
+Q3.$C0+Q3.$C1
Objects Used in Access Plan:
---------------------------

Schema: DB2INST1
Name: IX_PS_WIN
Type: Index
Time of creation:
2008-07-11-18.07.56.326366
Last statistics update:
2008-09-06-10.49.52.692970
Number of columns: 4
Number of rows: 2917997
Width of rows: -1
Number of buffer pool pages: 27158
Distinct row values: Yes
Tablespace name: PSIDXWIN
Tablespace overhead: 6.000000
Tablespace transfer rate: 0.400000
Source for statistics: Single Node
Prefetch page count: 128
Container extent page count: 32
Index clustering statistic: 0.941143
Index leaf pages: 51075
Index tree levels: 4
Index full key cardinality: 2917997
Index first key cardinality: 696
Index first 2 keys cardinality: 28598
Index first 3 keys cardinality: 2917997
Index first 4 keys cardinality: 2917997
Index sequential pages: 51074
Index page density: 98
Index avg sequential pages: 51074
Index avg gap between sequences:0
Index avg random pages: 0
Fetch avg sequential pages: -1
Fetch avg gap between sequences:-1
Fetch avg random pages: -1
Index RID count: 2917997
Index deleted RID count: 0
Index empty leaf pages: 0
Base Table Schema: RTM
Base Table Name:
TBL_COLLECT_PSSTAT_WIN_RTM
Columns in index:
MACHINE_ID
COLLECT_TIME
PROCESS_PID
PROCESS_NAME

Schema: RTM
Name: TBL_COLLECT_PSSTAT_WIN_RTM
Type: Table
Time of creation:
2008-01-17-11.51.05.399878
Last statistics update:
2008-09-06-10.49.52.692970
Number of columns: 12
Number of rows: 2917997
Width of rows: 50
Number of buffer pool pages: 27158
Distinct row values: No
Tablespace name: PSDATWIN
Tablespace overhead: 6.000000
Tablespace transfer rate: 0.400000
Source for statistics: Single Node
Prefetch page count: 128
Container extent page count: 32
Table overflow record count: 0
Table Active Blocks: -1
Sep 19 '08 #6

P: n/a
Just ideas... You might as well already have ruled them out...

One way to delete rows fast ist having them clustered by an MDC-index
(can be one-dimensional) and the registry variable DB2_MDC_ROLLOUT
being set (since DB2 V8.2 I believe). Just heard about it and I would
be interested in how much deletes are sped up by this in reality...

Another way of course is DB2 V9 partitioning and dropping partitions.
Sep 19 '08 #7

P: n/a
There are always contention on bufferpool and index root and leaf
pages between delete and insert whenever delete starts because delete
is always to delete the latest rows, those are rows that the insert
process has just inserted. So the performance degradation is expected.

RCT table maybe is worth to try, at least less contention. But there
are many restrictions on RCT table.

Sep 21 '08 #8

P: n/a
I don't really understand what you want to do with these steps...:
1) permanent (and concurrent) inserts with actual timestamp
2) every hour read / summarize - until here I understand - then delete
all until "now" - this I don't understand... because you will delete
un"processed/summarized" rows.

And also: You create an index and then a primary key with the same
columns... this throws an error, because the same columns can't be
used twice for an index.

Just my few cents...

On Sep 18, 10:21*pm, Michel Esber <mic...@us.automatos.comwrote:
Hello,

Env: DB2 V8 LUW FP16 running Linux

create Table X (machine_id varchar(24) not null, ctime timestamp not
null);
create index iFoo on X (MACHINE_ID, CTIME) allow reverse scans;
alter table X add primary key (MACHINE_ID, CTIME);

Our C++ application inserts data into a table X using CLI array insert
with very good throughput.

On an hourly basis, another application (running on a different
server) reads the last 'hour':

select * from T where MACHINE_ID = ? and ctime between current
timestamp - 1 hour and current timestamp for read only;

After that, it starts a delete loop:

while (SQLCODE <100 )
do
delete from (select 1 from T where MACHINE_ID = ? and CTIME between
current timestamp - 1 hour and current timestamp fetch first 2000 rows
only) as B
done

The deletes have severe impact on my insert rate/throughput. I can see
number 3 times slower, or even worse. There are absolutely *NO* lock-
waits.

Question: Should I expect this performance degradation or am I missing
something ? Maybe there is a better technique to delete data.

Any thoughts ?

Thanks in advance,
Sep 24 '08 #9

This discussion thread is closed

Replies have been disabled for this discussion.