473,378 Members | 1,411 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,378 software developers and data experts.

Delete and concurrency problems

Hello,

Env: DB2 V8 LUW FP16 running Linux

create Table X (machine_id varchar(24) not null, ctime timestamp not
null);
create index iFoo on X (MACHINE_ID, CTIME) allow reverse scans;
alter table X add primary key (MACHINE_ID, CTIME);

Our C++ application inserts data into a table X using CLI array insert
with very good throughput.

On an hourly basis, another application (running on a different
server) reads the last 'hour':

select * from T where MACHINE_ID = ? and ctime between current
timestamp - 1 hour and current timestamp for read only;

After that, it starts a delete loop:

while (SQLCODE <100 )
do
delete from (select 1 from T where MACHINE_ID = ? and CTIME between
current timestamp - 1 hour and current timestamp fetch first 2000 rows
only) as B
done
The deletes have severe impact on my insert rate/throughput. I can see
number 3 times slower, or even worse. There are absolutely *NO* lock-
waits.

Question: Should I expect this performance degradation or am I missing
something ? Maybe there is a better technique to delete data.

Any thoughts ?

Thanks in advance,
Sep 18 '08 #1
8 2951
"Michel Esber" <mi****@us.automatos.comwrote in message
news:d3**********************************@v39g2000 pro.googlegroups.com...
Hello,

Env: DB2 V8 LUW FP16 running Linux

create Table X (machine_id varchar(24) not null, ctime timestamp not
null);
create index iFoo on X (MACHINE_ID, CTIME) allow reverse scans;
alter table X add primary key (MACHINE_ID, CTIME);

Our C++ application inserts data into a table X using CLI array insert
with very good throughput.

On an hourly basis, another application (running on a different
server) reads the last 'hour':

select * from T where MACHINE_ID = ? and ctime between current
timestamp - 1 hour and current timestamp for read only;

After that, it starts a delete loop:

while (SQLCODE <100 )
do
delete from (select 1 from T where MACHINE_ID = ? and CTIME between
current timestamp - 1 hour and current timestamp fetch first 2000 rows
only) as B
done
The deletes have severe impact on my insert rate/throughput. I can see
number 3 times slower, or even worse. There are absolutely *NO* lock-
waits.

Question: Should I expect this performance degradation or am I missing
something ? Maybe there is a better technique to delete data.

Any thoughts ?

Thanks in advance,
db2set DB2_SKIPDELETED=ON
db2set DB2_SKIPINSERTED=ON

then restart the instance (db2stop, db2start)
Sep 18 '08 #2
db2set DB2_SKIPDELETED=ON
db2set DB2_SKIPINSERTED=ON

then restart the instance (db2stop, db2start)- Ocultar texto entre aspas -
Hi Mark,

Thanks for your feedback. However, this variable was already set.
Here is my config:

DB2LINUXAIO=TRUE
DB2_SKIPINSERTED=ON
DB2_SCATTERED_IO=ON
DB2_USE_ALTERNATE_PAGE_CLEANING=ON
DB2_EVALUNCOMMITTED=ON
DB2_SKIPDELETED=ON
DB2COMM=tcpip
DB2_PARALLEL_IO=*
DB2AUTOSTART=NO

Maybe v9 has a better locking/concurrency mechanism ?

-M
Sep 19 '08 #3
I was reading Chris Eatonīs article:

http://it.toolbox.com/blogs/db2luw/s...statement-7239

And it seems like the use of select from delete may be useful here.

However, here is the problem:

My application has to read up to 500K rows, summarizes data, and then
it loops deleting data in chunks of 2k rows.
I do need to read ALL the rows within a range and only after that
start the deletion.

select * from OLD TABLE (delete from X where MACHINE_ID = ? and ctime
between ? and ? fetch first 2000 rows only) doesnīt seem to suit my
needs.

Any thoughts?

Thanks,
Sep 19 '08 #4
Michel Esber wrote:
I was reading Chris Eatonīs article:

http://it.toolbox.com/blogs/db2luw/s...statement-7239

And it seems like the use of select from delete may be useful here.

However, here is the problem:

My application has to read up to 500K rows, summarizes data, and then
it loops deleting data in chunks of 2k rows.
I do need to read ALL the rows within a range and only after that
start the deletion.

select * from OLD TABLE (delete from X where MACHINE_ID = ? and ctime
between ? and ? fetch first 2000 rows only) doesnīt seem to suit my
needs.
General question: How does teh plan look like? Are you deleting by index
or does DB2 scan?

Cheers
Serge

--
Serge Rielau
DB2 Solutions Development
IBM Toronto Lab
Sep 19 '08 #5
General question: How does teh plan look like? Are you deleting by index
or does DB2 scan?
Hello Serge,

It is deleting by index. Here is the full plan:

******************** EXPLAIN INSTANCE ********************

DB2_VERSION: 08.02.8
SOURCE_NAME: SQLC2E07
SOURCE_SCHEMA: NULLID
SOURCE_VERSION:
EXPLAIN_TIME: 2008-09-19-11.05.35.744822
EXPLAIN_REQUESTER: DB2INST1

Database Context:
----------------
Parallelism: None
CPU Speed: 4.802167e-07
Comm Speed: 0
Buffer Pool size: 189000
Sort Heap size: 2048
Database Heap size: 16000
Lock List size: 8192
Maximum Lock List: 20
Average Applications: 50
Locks Available: 167116

Package Context:
---------------
SQL Type: Dynamic
Optimization Level: 5
Blocking: Block All Cursors
Isolation Level: Cursor Stability

---------------- STATEMENT 1 SECTION 203 ----------------
QUERYNO: 1
QUERYTAG:
Statement Type: Searched Delete
Updatable: Not Applicable
Deletable: Not Applicable
Query Degree: 1

Original Statement:
------------------
DELETE
FROM RTM.TBL_COLLECT_PSSTAT_WIN_RTM
where MACHINE_ID = ? and collect_time between ? and ?
Optimized Statement:
-------------------
DELETE
FROM RTM.TBL_COLLECT_PSSTAT_WIN_RTM AS Q1
WHERE $RID$ IN
(SELECT $RID$
FROM RTM.TBL_COLLECT_PSSTAT_WIN_RTM AS Q2
WHERE (:? <= :?) AND (Q2.COLLECT_TIME <= :?) AND (:? <=
Q2.COLLECT_TIME)
AND (Q2.MACHINE_ID = :?))

Access Plan:
-----------
Total Cost: 73.0298
Query Degree: 1

Rows
RETURN
( 1)
Cost
I/O
|
8.39196
DELETE
( 2)
73.0298
11.392
/------+------\
8.39196 2.918e+06
IXSCAN TABLE: RTM
( 3) TBL_COLLECT_PSSTAT_WIN_RTM
19.2689
3
|
2.918e+06
INDEX: DB2INST1
IX_PS_WIN


Extended Diagnostic Information:
--------------------------------

No extended Diagnostic Information for this statment.
Plan Details:
-------------
1) RETURN: (Return Result)
Cumulative Total Cost: 73.0298
Cumulative CPU Cost: 252522
Cumulative I/O Cost: 11.392
Cumulative Re-Total Cost: 53.794
Cumulative Re-CPU Cost: 178061
Cumulative Re-I/O Cost: 8.39196
Cumulative First Row Cost: 73.0298
Estimated Bufferpool Buffers: 12.392

Arguments:
---------
BLDLEVEL: (Build level)
DB2 v8.1.2.136 : special_19546
HEAPUSE : (Maximum Statement Heap Usage)
56 Pages
STMTHEAP: (Statement heap size)
4096

Input Streams:
-------------
4) From Operator #2

Estimated number of rows:
8.39196
Number of columns: 0
Subquery predicate ID: Not
Applicable
2) DELETE: (Delete)
Cumulative Total Cost: 73.0298
Cumulative CPU Cost: 252522
Cumulative I/O Cost: 11.392
Cumulative Re-Total Cost: 53.794
Cumulative Re-CPU Cost: 178061
Cumulative Re-I/O Cost: 8.39196
Cumulative First Row Cost: 73.0298
Estimated Bufferpool Buffers: 12.392

Input Streams:
-------------
2) From Operator #3

Estimated number of rows:
8.39196
Number of columns: 2
Subquery predicate ID: Not
Applicable

Column Names:
------------
+Q3.$C0+Q3.$C1
Output Streams:
--------------
3) To Object RTM.TBL_COLLECT_PSSTAT_WIN_RTM

Estimated number of rows: 2.918e
+06
Number of columns: 1
Subquery predicate ID: Not
Applicable

Column Names:
------------
+Q1.$RID$

4) To Operator #1

Estimated number of rows:
8.39196
Number of columns: 0
Subquery predicate ID: Not
Applicable
3) IXSCAN: (Index Scan)
Cumulative Total Cost: 19.2689
Cumulative CPU Cost: 143426
Cumulative I/O Cost: 3
Cumulative Re-Total Cost: 0.0331183
Cumulative Re-CPU Cost: 68965.2
Cumulative Re-I/O Cost: 0
Cumulative First Row Cost: 19.245
Estimated Bufferpool Buffers: 4

Arguments:
---------
MAXPAGES: (Maximum pages for prefetch)
1
PREFETCH: (Type of Prefetch)
NONE
ROWLOCK : (Row Lock intent)
UPDATE
SCANDIR : (Scan Direction)
FORWARD
TABLOCK : (Table Lock intent)
INTENT EXCLUSIVE

Predicates:
----------
2) Sargable Predicate
Relational Operator: Less Than or
Equal (<=)
Subquery Input Required: No
Filter Factor: 0.333333

Predicate Text:
--------------
(:? <= :?)

3) Stop Key Predicate
Relational Operator: Less Than or
Equal (<=)
Subquery Input Required: No
Filter Factor: 0.0200165

Predicate Text:
--------------
(Q2.COLLECT_TIME <= :?)

4) Start Key Predicate
Relational Operator: Less Than or
Equal (<=)
Subquery Input Required: No
Filter Factor: 0.0200165

Predicate Text:
--------------
(:? <= Q2.COLLECT_TIME)

5) Start Key Predicate
Relational Operator: Equal (=)
Subquery Input Required: No
Filter Factor: 0.00143678

Predicate Text:
--------------
(Q2.MACHINE_ID = :?)

5) Stop Key Predicate
Relational Operator: Equal (=)
Subquery Input Required: No
Filter Factor: 0.00143678

Predicate Text:
--------------
(Q2.MACHINE_ID = :?)
Input Streams:
-------------
1) From Object DB2INST1.IX_PS_WIN

Estimated number of rows: 2.918e
+06
Number of columns: 3
Subquery predicate ID: Not
Applicable

Column Names:
------------
+Q2.$RID$+Q2.MACHINE_ID
+Q2.COLLECT_TIME
Output Streams:
--------------
2) To Operator #2

Estimated number of rows:
8.39196
Number of columns: 2
Subquery predicate ID: Not
Applicable

Column Names:
------------
+Q3.$C0+Q3.$C1
Objects Used in Access Plan:
---------------------------

Schema: DB2INST1
Name: IX_PS_WIN
Type: Index
Time of creation:
2008-07-11-18.07.56.326366
Last statistics update:
2008-09-06-10.49.52.692970
Number of columns: 4
Number of rows: 2917997
Width of rows: -1
Number of buffer pool pages: 27158
Distinct row values: Yes
Tablespace name: PSIDXWIN
Tablespace overhead: 6.000000
Tablespace transfer rate: 0.400000
Source for statistics: Single Node
Prefetch page count: 128
Container extent page count: 32
Index clustering statistic: 0.941143
Index leaf pages: 51075
Index tree levels: 4
Index full key cardinality: 2917997
Index first key cardinality: 696
Index first 2 keys cardinality: 28598
Index first 3 keys cardinality: 2917997
Index first 4 keys cardinality: 2917997
Index sequential pages: 51074
Index page density: 98
Index avg sequential pages: 51074
Index avg gap between sequences:0
Index avg random pages: 0
Fetch avg sequential pages: -1
Fetch avg gap between sequences:-1
Fetch avg random pages: -1
Index RID count: 2917997
Index deleted RID count: 0
Index empty leaf pages: 0
Base Table Schema: RTM
Base Table Name:
TBL_COLLECT_PSSTAT_WIN_RTM
Columns in index:
MACHINE_ID
COLLECT_TIME
PROCESS_PID
PROCESS_NAME

Schema: RTM
Name: TBL_COLLECT_PSSTAT_WIN_RTM
Type: Table
Time of creation:
2008-01-17-11.51.05.399878
Last statistics update:
2008-09-06-10.49.52.692970
Number of columns: 12
Number of rows: 2917997
Width of rows: 50
Number of buffer pool pages: 27158
Distinct row values: No
Tablespace name: PSDATWIN
Tablespace overhead: 6.000000
Tablespace transfer rate: 0.400000
Source for statistics: Single Node
Prefetch page count: 128
Container extent page count: 32
Table overflow record count: 0
Table Active Blocks: -1
Sep 19 '08 #6
Just ideas... You might as well already have ruled them out...

One way to delete rows fast ist having them clustered by an MDC-index
(can be one-dimensional) and the registry variable DB2_MDC_ROLLOUT
being set (since DB2 V8.2 I believe). Just heard about it and I would
be interested in how much deletes are sped up by this in reality...

Another way of course is DB2 V9 partitioning and dropping partitions.
Sep 19 '08 #7
There are always contention on bufferpool and index root and leaf
pages between delete and insert whenever delete starts because delete
is always to delete the latest rows, those are rows that the insert
process has just inserted. So the performance degradation is expected.

RCT table maybe is worth to try, at least less contention. But there
are many restrictions on RCT table.

Sep 21 '08 #8
I don't really understand what you want to do with these steps...:
1) permanent (and concurrent) inserts with actual timestamp
2) every hour read / summarize - until here I understand - then delete
all until "now" - this I don't understand... because you will delete
un"processed/summarized" rows.

And also: You create an index and then a primary key with the same
columns... this throws an error, because the same columns can't be
used twice for an index.

Just my few cents...

On Sep 18, 10:21*pm, Michel Esber <mic...@us.automatos.comwrote:
Hello,

Env: DB2 V8 LUW FP16 running Linux

create Table X (machine_id varchar(24) not null, ctime timestamp not
null);
create index iFoo on X (MACHINE_ID, CTIME) allow reverse scans;
alter table X add primary key (MACHINE_ID, CTIME);

Our C++ application inserts data into a table X using CLI array insert
with very good throughput.

On an hourly basis, another application (running on a different
server) reads the last 'hour':

select * from T where MACHINE_ID = ? and ctime between current
timestamp - 1 hour and current timestamp for read only;

After that, it starts a delete loop:

while (SQLCODE <100 )
do
delete from (select 1 from T where MACHINE_ID = ? and CTIME between
current timestamp - 1 hour and current timestamp fetch first 2000 rows
only) as B
done

The deletes have severe impact on my insert rate/throughput. I can see
number 3 times slower, or even worse. There are absolutely *NO* lock-
waits.

Question: Should I expect this performance degradation or am I missing
something ? Maybe there is a better technique to delete data.

Any thoughts ?

Thanks in advance,
Sep 24 '08 #9

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

16
by: aurora | last post by:
Hello! Just gone though an article via Slashdot titled "The Free Lunch Is Over: A Fundamental Turn Toward Concurrency in Software" http://www.gotw.ca/publications/concurrency-ddj.htm]. It argues...
3
by: Suzanne | last post by:
Hi All I'm having problems getting my data adapter to throw a concurrency exception with an INSERT command. I want to throw a concurrency exception if an attempt is made to enter a row into...
32
by: Christopher Benson-Manica | last post by:
Is the following code legal, moral, and advisable? #include <iostream> class A { private: int a; public: A() : a(42) {}
4
by: Steven Nagy | last post by:
Hi Have a problem that consistantly occurs in my applications where using a DataAdapter (OLEDB) and a dataset. Using a simple process of adding a row to the dataset table and then calling the...
2
by: BobAchgill | last post by:
Do you know why this error might be happening? Maybe it is because I have two data adapters open on the same MDB file?? If this is the problem... how can I keep them from disturbing each other?...
4
by: papaja | last post by:
Hi, I posted this on asp.net forums already, but nobody answered it. Here is direct link: http://forums.asp.net/1124640/ShowPost.aspx. Here is the question: I'm using this code in delete...
9
by: corey.coughlin | last post by:
Alright, so I've been following some of the arguments about enhancing parallelism in python, and I've kind of been struck by how hard things still are. It seems like what we really need is a more...
2
by: John | last post by:
Hi During db development one of the most frustrating thing is the Concurrency Violation error which seems to be a general error message for any error during saving of record(s). How can one...
6
by: Jordan S. | last post by:
Using .NET 3.5... 1. Will ASP.NET guarantee that no more than ONE instance of a custom HTTP handler factory (class implementing IHttpHandlerFactory) is ever created in an ASP.NET application? ...
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.