By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
435,131 Members | 1,437 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 435,131 IT Pros & Developers. It's quick & easy.

question: multiple row insert vs. merge

P: n/a
My app needs to insert thousand value rows into a mostly empty table
(data are read from a file). I can either use inserts, or use merge.
The advantage of using merge is that in the few cases where the table
is not empty, it can take care of the updating part, which makes the
app cleaner.

However, my concern is the merge state would slow dowm the insertion
of new data, since in most cases the table is empty.

So my questions (before I run any comprehensive tests) are:

1. would merge slower than insert if it is an empty table?
2. given that we can do multiple row value inserts, would insert be
even faster than merge with which there is no multipe row option?

Thanks a lot!
Sep 2 '08 #1
Share this Question
Share on Google+
24 Replies


P: n/a
Henry J. wrote:
My app needs to insert thousand value rows into a mostly empty table
(data are read from a file). I can either use inserts, or use merge.
The advantage of using merge is that in the few cases where the table
is not empty, it can take care of the updating part, which makes the
app cleaner.

However, my concern is the merge state would slow dowm the insertion
of new data, since in most cases the table is empty.

So my questions (before I run any comprehensive tests) are:

1. would merge slower than insert if it is an empty table?
2. given that we can do multiple row value inserts, would insert be
even faster than merge with which there is no multipe row option?
Is this DB2 for zOS? Which version?
Keep in mind that in DB2 for zOS supports NOT ATOMIC "array merge".
And of course DB2 for LUW support just regular MERGE with a VALUES
clause than can have multiple rows (or be based on a query)

Cheers
Serge

--
Serge Rielau
DB2 Solutions Development
IBM Toronto Lab
Sep 2 '08 #2

P: n/a
On Sep 2, 4:59*pm, Serge Rielau <srie...@ca.ibm.comwrote:
Is this DB2 for zOS? Which version?
Keep in mind that in DB2 for zOS supports NOT ATOMIC "array merge".
And of course DB2 for LUW support just regular MERGE with a VALUES
clause than can have multiple rows (or be based on a query)
The DB2 version is DB2/LINUXX8664 9.1.4.
Sep 2 '08 #3

P: n/a
Henry J. wrote:
On Sep 2, 4:59 pm, Serge Rielau <srie...@ca.ibm.comwrote:
>Is this DB2 for zOS? Which version?
Keep in mind that in DB2 for zOS supports NOT ATOMIC "array merge".
And of course DB2 for LUW support just regular MERGE with a VALUES
clause than can have multiple rows (or be based on a query)

The DB2 version is DB2/LINUXX8664 9.1.4.
OK, then I don't understand your comment on multi row MERGE.
Also if the target table is empty The MERGE will be quick to decide that
no row matches. So I'm not worried about performance.

When in doubt compare the plans (and test).

Cheers
Serge

--
Serge Rielau
DB2 Solutions Development
IBM Toronto Lab
Sep 2 '08 #4

P: n/a
On Sep 2, 6:28*pm, Serge Rielau <srie...@ca.ibm.comwrote:
Henry J. wrote:
On Sep 2, 4:59 pm, Serge Rielau <srie...@ca.ibm.comwrote:
Is this DB2 for zOS? Which version?
Keep in mind that in DB2 for zOS supports NOT ATOMIC "array merge".
And of course DB2 for LUW support just regular MERGE with a VALUES
clause than can have multiple rows (or be based on a query)
The DB2 version is DB2/LINUXX8664 9.1.4.

OK, then I don't understand your comment on multi row MERGE.
Also if the target table is empty The MERGE will be quick to decide that
no row matches. So I'm not worried about performance.

When in doubt compare the plans (and test).

Cheers
Serge

--
Serge Rielau
DB2 Solutions Development
IBM Toronto Lab
Thanks Serge. Here are two notes:

1). The app is written in C++ using embeded SQL with parameter
binding. Depending on how "insert into table values(?,?,?)" is
implemented, I guess it is possible for C++ to figure out it should
use "insert into table values(?,?,?) values (?,?,?) ....
values(?,?,?)" which could be faster?

2) Even though the table is empty at first, since the app does the
value inserts in batches, the table will gradually be filled up and it
would take more and more time to find there is no match before the
insertion can be actually done?

Thanks!
Sep 3 '08 #5

P: n/a
Henry J. wrote:
On Sep 2, 6:28 pm, Serge Rielau <srie...@ca.ibm.comwrote:
>Henry J. wrote:
>>On Sep 2, 4:59 pm, Serge Rielau <srie...@ca.ibm.comwrote:
Is this DB2 for zOS? Which version?
Keep in mind that in DB2 for zOS supports NOT ATOMIC "array merge".
And of course DB2 for LUW support just regular MERGE with a VALUES
clause than can have multiple rows (or be based on a query)
The DB2 version is DB2/LINUXX8664 9.1.4.
OK, then I don't understand your comment on multi row MERGE.
Also if the target table is empty The MERGE will be quick to decide that
no row matches. So I'm not worried about performance.

When in doubt compare the plans (and test).

Cheers
Serge

--
Serge Rielau
DB2 Solutions Development
IBM Toronto Lab

Thanks Serge. Here are two notes:

1). The app is written in C++ using embeded SQL with parameter
binding. Depending on how "insert into table values(?,?,?)" is
implemented, I guess it is possible for C++ to figure out it should
use "insert into table values(?,?,?) values (?,?,?) ....
values(?,?,?)" which could be faster?
VALUES (?, ?, ?), (?, ?, ?), .... is the correct syntax
Yes that would be faster than individual inserts.
2) Even though the table is empty at first, since the app does the
value inserts in batches, the table will gradually be filled up and it
would take more and more time to find there is no match before the
insertion can be actually done?
I do assume that you have an index....
--
Serge Rielau
DB2 Solutions Development
IBM Toronto Lab
Sep 3 '08 #6

P: n/a
VALUES (?, ?, ?), (?, ?, ?), .... is the correct syntax

Imagine I have to insert n rows, and n is a different number every
time.

What is the recommended method from an embedded program/ODBC/JDBC (I
guess the same applies to all three):

- insert into (c1,c2) values (?, ?) (n statements with one row each)
- insert into (c1,c2) values (?, ?), (?, ?), (?, ?), ... (1 statement
with n rows)
- insert into (c1, c2) values (?, ?), (?, ?), ... (split the n rows
into n/100 inserts, and do the remainder in single row inserts)?

I think this answer would be an ideal pick for an FAQ as it seems to
be asked again and again... Is there such an FAQ for c.d.ibm-db2?

Did someone try it out performance-wise? My guess is the last one
would be the fastest if such inserts occur very often (no recompile of
the statement), and the second one might be faster if the insert occur
only once in a while.

Sep 4 '08 #7

P: n/a
w.*********@googlemail.com wrote:
>VALUES (?, ?, ?), (?, ?, ?), .... is the correct syntax

Imagine I have to insert n rows, and n is a different number every
time.

What is the recommended method from an embedded program/ODBC/JDBC (I
guess the same applies to all three):

- insert into (c1,c2) values (?, ?) (n statements with one row each)
- insert into (c1,c2) values (?, ?), (?, ?), (?, ?), ... (1 statement
with n rows)
- insert into (c1, c2) values (?, ?), (?, ?), ... (split the n rows
into n/100 inserts, and do the remainder in single row inserts)?

I think this answer would be an ideal pick for an FAQ as it seems to
be asked again and again... Is there such an FAQ for c.d.ibm-db2?
Google for "SQL on Fire". I describe the same technique there.
Now, some Db2 clients (CLI/ODBC for sure) JDBC I don't know do support
array or buffered insert.
In that case the client gathers up all the insert variables and sends
them to the server in one shot. The server then executes a single row
insert in a tight loop.

Cheers
Serge

--
Serge Rielau
DB2 Solutions Development
IBM Toronto Lab
Sep 4 '08 #8

P: n/a
Google for "SQL on Fire". I describe the same technique there.

Found it :-)

Quote:
"Tip:

For mass inserts prepare INSERT statements with:
1, 10, 100, 1000, 10000 rows of parameter markers each.
Execute the biggest that fits the remaining load in a loop."

Thanks!
Sep 4 '08 #9

P: n/a
On Sep 4, 7:15 am, Serge Rielau <srie...@ca.ibm.comwrote:
w.l.fisc...@googlemail.com wrote:
VALUES (?, ?, ?), (?, ?, ?), .... is the correct syntax
Imagine I have to insert n rows, and n is a different number every
time.
What is the recommended method from an embedded program/ODBC/JDBC (I
guess the same applies to all three):
- insert into (c1,c2) values (?, ?) (n statements with one row each)
- insert into (c1,c2) values (?, ?), (?, ?), (?, ?), ... (1 statement
with n rows)
- insert into (c1, c2) values (?, ?), (?, ?), ... (split the n rows
into n/100 inserts, and do the remainder in single row inserts)?
I think this answer would be an ideal pick for an FAQ as it seems to
be asked again and again... Is there such an FAQ for c.d.ibm-db2?

Google for "SQL on Fire". I describe the same technique there.
Now, some Db2 clients (CLI/ODBC for sure) JDBC I don't know do support
array or buffered insert.
In that case the client gathers up all the insert variables and sends
them to the server in one shot. The server then executes a single row
insert in a tight loop.
If I send a bunch of single-row inserts like in "insert into t
values(?,?) insert into t values(?,?)
.....insert into t values(?,?) " via CLI/ODBC to DB2 in one *batch* ,
will CLI/ODBC know
to convert it to "insert into t values(?,?) (?,?) ....(?,?) " before
executing it?

Sep 5 '08 #10

P: n/a
Henry J. wrote:
On Sep 4, 7:15 am, Serge Rielau <srie...@ca.ibm.comwrote:
>w.l.fisc...@googlemail.com wrote:
>>>VALUES (?, ?, ?), (?, ?, ?), .... is the correct syntax
Imagine I have to insert n rows, and n is a different number every
time.
What is the recommended method from an embedded program/ODBC/JDBC (I
guess the same applies to all three):
- insert into (c1,c2) values (?, ?) (n statements with one row each)
- insert into (c1,c2) values (?, ?), (?, ?), (?, ?), ... (1 statement
with n rows)
- insert into (c1, c2) values (?, ?), (?, ?), ... (split the n rows
into n/100 inserts, and do the remainder in single row inserts)?
I think this answer would be an ideal pick for an FAQ as it seems to
be asked again and again... Is there such an FAQ for c.d.ibm-db2?
Google for "SQL on Fire". I describe the same technique there.
Now, some Db2 clients (CLI/ODBC for sure) JDBC I don't know do support
array or buffered insert.
In that case the client gathers up all the insert variables and sends
them to the server in one shot. The server then executes a single row
insert in a tight loop.

If I send a bunch of single-row inserts like in "insert into t
values(?,?) insert into t values(?,?)
....insert into t values(?,?) " via CLI/ODBC to DB2 in one *batch* ,
will CLI/ODBC know
to convert it to "insert into t values(?,?) (?,?) ....(?,?) " before
executing it?
No. CLI/ODBC will shred the statements and send them one by one.

Cheers
Serge
--
Serge Rielau
DB2 Solutions Development
IBM Toronto Lab
Sep 5 '08 #11

P: n/a
Hi!

If you really want good performance, inserting rows into a table,
assuming your application can be designed this way:

1. Write the rows to be inserted or updated to a diskfile.
2. Use the load utility to load the file into a worktable.
3. Use the merge statement to update the target table from the worktable.

/dg
"Henry J." <ta********@yahoo.comwrote in message news:a1**********************************@59g2000h sb.googlegroups.com...
My app needs to insert thousand value rows into a mostly empty table
(data are read from a file). I can either use inserts, or use merge.
The advantage of using merge is that in the few cases where the table
is not empty, it can take care of the updating part, which makes the
app cleaner.

However, my concern is the merge state would slow dowm the insertion
of new data, since in most cases the table is empty.

So my questions (before I run any comprehensive tests) are:

1. would merge slower than insert if it is an empty table?
2. given that we can do multiple row value inserts, would insert be
even faster than merge with which there is no multipe row option?

Thanks a lot!

Sep 6 '08 #12

P: n/a
On Sep 5, 7:23 am, Serge Rielau <srie...@ca.ibm.comwrote:
Henry J. wrote:
On Sep 4, 7:15 am, Serge Rielau <srie...@ca.ibm.comwrote:
w.l.fisc...@googlemail.com wrote:
VALUES (?, ?, ?), (?, ?, ?), .... is the correct syntax
Imagine I have to insert n rows, and n is a different number every
time.
What is the recommended method from an embedded program/ODBC/JDBC (I
guess the same applies to all three):
- insert into (c1,c2) values (?, ?) (n statements with one row each)
- insert into (c1,c2) values (?, ?), (?, ?), (?, ?), ... (1 statement
with n rows)
- insert into (c1, c2) values (?, ?), (?, ?), ... (split the n rows
into n/100 inserts, and do the remainder in single row inserts)?
I think this answer would be an ideal pick for an FAQ as it seems to
be asked again and again... Is there such an FAQ for c.d.ibm-db2?
Google for "SQL on Fire". I describe the same technique there.
Now, some Db2 clients (CLI/ODBC for sure) JDBC I don't know do support
array or buffered insert.
In that case the client gathers up all the insert variables and sends
them to the server in one shot. The server then executes a single row
insert in a tight loop.
If I send a bunch of single-row inserts like in "insert into t
values(?,?) insert into t values(?,?)
....insert into t values(?,?) " via CLI/ODBC to DB2 in one *batch* ,
will CLI/ODBC know
to convert it to "insert into t values(?,?) (?,?) ....(?,?) " before
executing it?

No. CLI/ODBC will shred the statements and send them one by one.

Cheers
Serge

--
Serge Rielau
DB2 Solutions Development
IBM Toronto Lab
Good. Another question is, if I send a list of inserts each with two
value tuples, like this:
"insert into t values(?,?) (?,?)", instead of a list of "insert into t
values(?,?)",would it speed up the insertion? Thanks a lot! This is
really helpful, Serge.
Sep 7 '08 #13

P: n/a
On Sep 6, 3:22 am, "Dan van Ginhoven" <danfa...@hotmail.comwrote:
Hi!

If you really want good performance, inserting rows into a table,
assuming your application can be designed this way:

1. Write the rows to be inserted or updated to a diskfile.
2. Use the load utility to load the file into a worktable.
3. Use the merge statement to update the target table from the worktable.

/dg
Good tip. Questions:

1) what kind of load utility is best for the method you mentioned? In
Sybase, bcp is very good at this. I haven't heard of its counterpart
in DB2 yet.
2) Instead of writing to a file and then uploading it into a
worktable, how about inserting into a worktable directly? Will it be
slower than your method? or would it still be faster than inserting
into the target table directly? Note that it would be much simpler to
implement.

Thanks!

Sep 7 '08 #14

P: n/a
Henry J. wrote:
On Sep 5, 7:23 am, Serge Rielau <srie...@ca.ibm.comwrote:
>Henry J. wrote:
>>On Sep 4, 7:15 am, Serge Rielau <srie...@ca.ibm.comwrote:
w.l.fisc...@googlemail.com wrote:
>VALUES (?, ?, ?), (?, ?, ?), .... is the correct syntax
Imagine I have to insert n rows, and n is a different number every
time.
What is the recommended method from an embedded program/ODBC/JDBC (I
guess the same applies to all three):
- insert into (c1,c2) values (?, ?) (n statements with one row each)
- insert into (c1,c2) values (?, ?), (?, ?), (?, ?), ... (1 statement
with n rows)
- insert into (c1, c2) values (?, ?), (?, ?), ... (split the n rows
into n/100 inserts, and do the remainder in single row inserts)?
I think this answer would be an ideal pick for an FAQ as it seems to
be asked again and again... Is there such an FAQ for c.d.ibm-db2?
Google for "SQL on Fire". I describe the same technique there.
Now, some Db2 clients (CLI/ODBC for sure) JDBC I don't know do support
array or buffered insert.
In that case the client gathers up all the insert variables and sends
them to the server in one shot. The server then executes a single row
insert in a tight loop.
If I send a bunch of single-row inserts like in "insert into t
values(?,?) insert into t values(?,?)
....insert into t values(?,?) " via CLI/ODBC to DB2 in one *batch* ,
will CLI/ODBC know
to convert it to "insert into t values(?,?) (?,?) ....(?,?) " before
executing it?
No. CLI/ODBC will shred the statements and send them one by one.

Cheers
Serge

--
Serge Rielau
DB2 Solutions Development
IBM Toronto Lab

Good. Another question is, if I send a list of inserts each with two
value tuples, like this:
"insert into t values(?,?) (?,?)", instead of a list of "insert into t
values(?,?)",would it speed up the insertion? Thanks a lot! This is
really helpful, Serge.
Yes. I did some measurements a while ago (very unscientific) and found
that it makes sense to bundle up to 100 rows. After that it doesn't make
a difference. Of course network latency plays a role (my client was local).

Cheers
Serge
--
Serge Rielau
DB2 Solutions Development
IBM Toronto Lab
Sep 7 '08 #15

P: n/a
1) what kind of load utility is best for the method you mentioned? *In
Sybase, bcp is very good at this. *I haven't heard of its counterpart
in DB2 yet.
Search the documentation for the "load" command. This example loads
from a file in csv format:
load from file.csv of del replace into <tablenamenonrecoverable;
2) Instead of writing to a file and then uploading it into a
worktable, how about inserting into a worktable directly? *Will it be
slower than your method? *or would it still be faster than inserting
into the target table directly? *Note that it would be much simpler to
implement.
I don't know but I I guess it will be slower instead of faster.
Sep 9 '08 #16

P: n/a

"Henry J." <ta********@yahoo.comwrote in message news:cb**********************************@s50g2000 hsb.googlegroups.com...
On Sep 6, 3:22 am, "Dan van Ginhoven" <danfa...@hotmail.comwrote:
Hi!

If you really want good performance, inserting rows into a table,
assuming your application can be designed this way:

1. Write the rows to be inserted or updated to a diskfile.
2. Use the load utility to load the file into a worktable.
3. Use the merge statement to update the target table from the worktable.

/dg

Good tip. Questions:

1) what kind of load utility is best for the method you mentioned? In
Sybase, bcp is very good at this. I haven't heard of its counterpart
in DB2 yet.
db2 load is as fast as bcp but the syntax is more complex.
2) Instead of writing to a file and then uploading it into a
worktable, how about inserting into a worktable directly? Will it be
slower than your method? or would it still be faster than inserting
into the target table directly? Note that it would be much simpler to
implement.
It depends on the number of rows.
Using db2load means going out of db2, and access it from the command line.
db2load take some time to initialize. There is a break even somewhere
and I guess that it is above some thousand rows. So unless you are in a batch situation as opposed to online
I wouldn't try load.
/dg
Thanks!

Sep 9 '08 #17

P: n/a
On Sep 7, 2:48*pm, Serge Rielau <srie...@ca.ibm.comwrote:
Good. *Another question is, if I send a list of inserts each with two
value tuples, like this:
"insert into t values(?,?) (?,?)", instead of a list of "insert into t
values(?,?)",would it speed up the insertion? *Thanks a lot! *This is
really helpful, Serge.

Yes. I did some measurements a while ago (very unscientific) and found
that it makes sense to bundle up to 100 rows. After that it doesn't make
a difference. Of course network latency plays a role (my client was local).
Hope this is my last question in this thread. Now I know it is faster
to send a batch of 1000
"insert into t values(?,?) (?,?)" than sending 2000 "insert into t
values(?,?)". Is it equally fast if i send 1000 merge statements like
"merge into t using table (values(?,?) (?,?) ) ..."? Faster than 2000
"merge into t using table (values(?,?) ) ...", right?

I know I can find out by testing but tips from experts like you guys
save me tons of time ... Really appreciate it.

Thanks!
Sep 10 '08 #18

P: n/a
Henry J. wrote:
On Sep 7, 2:48 pm, Serge Rielau <srie...@ca.ibm.comwrote:
>>Good. Another question is, if I send a list of inserts each with two
value tuples, like this:
"insert into t values(?,?) (?,?)", instead of a list of "insert into t
values(?,?)",would it speed up the insertion? Thanks a lot! This is
really helpful, Serge.
Yes. I did some measurements a while ago (very unscientific) and found
that it makes sense to bundle up to 100 rows. After that it doesn't make
a difference. Of course network latency plays a role (my client was local).

Hope this is my last question in this thread. Now I know it is faster
to send a batch of 1000
"insert into t values(?,?) (?,?)" than sending 2000 "insert into t
values(?,?)". Is it equally fast if i send 1000 merge statements like
"merge into t using table (values(?,?) (?,?) ) ..."? Faster than 2000
"merge into t using table (values(?,?) ) ...", right?

I know I can find out by testing but tips from experts like you guys
save me tons of time ... Really appreciate it.
MERGE is a special child....
If you use a MERGE straight with a multi-row VALUES DB2 does not know
whether the new rows match to unique rows in the MERGE target.
If DB2 can't rely on that the MERGE statement gets a lot more complex
(and slower). Here is how to do it right:

MERGE INTO T USING (SELECT pk, MAX(c1), MAX(c2) FROM (VALUES (?, ?, ?),
....) AS V(pk, c1, c2)) AS S(pk, c1, c2)
ON T.pk = S.pk
WHEN MATCHED THEN UPDATE SET c1 = S.c1, c2 = S.c2
WHEN NOT MATCHED THEN INSERT VALUES (pk, c1, c2)

Not that in a case of 200 single-row merge teh semantics are different.
You can absolutely have dups since teh MERGEs are commulatative (sp??).

Cheers
Serge

--
Serge Rielau
DB2 Solutions Development
IBM Toronto Lab
Sep 10 '08 #19

P: n/a
On Sep 10, 10:27*am, Serge Rielau <srie...@ca.ibm.comwrote:
Henry J. wrote:
On Sep 7, 2:48 pm, Serge Rielau <srie...@ca.ibm.comwrote:
>Good. *Another question is, if I send a list of inserts each with two
value tuples, like this:
"insert into t values(?,?) (?,?)", instead of a list of "insert into t
values(?,?)",would it speed up the insertion? *Thanks a lot! *This is
really helpful, Serge.
Yes. I did some measurements a while ago (very unscientific) and found
that it makes sense to bundle up to 100 rows. After that it doesn't make
a difference. Of course network latency plays a role (my client was local).
Hope this is my last question in this thread. *Now I know it is faster
to send a batch of 1000
"insert into t values(?,?) (?,?)" than sending 2000 "insert into t
values(?,?)". *Is it equally fast if i send 1000 merge statements like
"merge into t using table (values(?,?) (?,?) ) ..."? *Faster than 2000
"merge into t using table (values(?,?) ) ...", right?
I know I can find out by testing but tips from experts like you guys
save me tons of time ... *Really appreciate it.

MERGE is a special child....
If you use a MERGE straight with a multi-row VALUES DB2 does not know
whether the new rows match to unique rows in the MERGE target.
If DB2 can't rely on that the MERGE statement gets a lot more complex
(and slower). Here is how to do it right:

MERGE INTO T USING (SELECT pk, MAX(c1), MAX(c2) FROM (VALUES (?, ?, ?),
...) AS V(pk, c1, c2)) AS S(pk, c1, c2)
ON T.pk = S.pk
WHEN MATCHED THEN UPDATE SET c1 = S.c1, c2 = S.c2
WHEN NOT MATCHED THEN INSERT VALUES (pk, c1, c2)

Not that in a case of 200 single-row merge teh semantics are different.
You can absolutely have dups since teh MERGEs are commulatative (sp??).

Cheers
Serge

--
Serge Rielau
DB2 Solutions Development
IBM Toronto Lab- Hide quoted text -

- Show quoted text -
Do you mean that we indeed can achieve similar performance boost using
merge? I'm not sure if I understand your trick with "select
max ...". Are you intended to do a "group by"? Can we use union to
remove dupe as in "values(?,?,?) union values(?,?,?) union .... "?

Thanks!
Sep 17 '08 #20

P: n/a
Henry J. wrote:
On Sep 10, 10:27 am, Serge Rielau <srie...@ca.ibm.comwrote:
>Henry J. wrote:
>>On Sep 7, 2:48 pm, Serge Rielau <srie...@ca.ibm.comwrote:
Good. Another question is, if I send a list of inserts each with two
value tuples, like this:
"insert into t values(?,?) (?,?)", instead of a list of "insert into t
values(?,?)",would it speed up the insertion? Thanks a lot! This is
really helpful, Serge.
Yes. I did some measurements a while ago (very unscientific) and found
that it makes sense to bundle up to 100 rows. After that it doesn't make
a difference. Of course network latency plays a role (my client was local).
Hope this is my last question in this thread. Now I know it is faster
to send a batch of 1000
"insert into t values(?,?) (?,?)" than sending 2000 "insert into t
values(?,?)". Is it equally fast if i send 1000 merge statements like
"merge into t using table (values(?,?) (?,?) ) ..."? Faster than 2000
"merge into t using table (values(?,?) ) ...", right?
I know I can find out by testing but tips from experts like you guys
save me tons of time ... Really appreciate it.
MERGE is a special child....
If you use a MERGE straight with a multi-row VALUES DB2 does not know
whether the new rows match to unique rows in the MERGE target.
If DB2 can't rely on that the MERGE statement gets a lot more complex
(and slower). Here is how to do it right:

MERGE INTO T USING (SELECT pk, MAX(c1), MAX(c2) FROM (VALUES (?, ?, ?),
...) AS V(pk, c1, c2)) AS S(pk, c1, c2)
ON T.pk = S.pk
WHEN MATCHED THEN UPDATE SET c1 = S.c1, c2 = S.c2
WHEN NOT MATCHED THEN INSERT VALUES (pk, c1, c2)

Not that in a case of 200 single-row merge teh semantics are different.
You can absolutely have dups since teh MERGEs are commulatative (sp??).

Cheers
Serge

--
Serge Rielau
DB2 Solutions Development
IBM Toronto Lab- Hide quoted text -

- Show quoted text -

Do you mean that we indeed can achieve similar performance boost using
merge? I'm not sure if I understand your trick with "select
max ...". Are you intended to do a "group by"? Can we use union to
remove dupe as in "values(?,?,?) union values(?,?,?) union .... "?
Yes, you need a GROUP BY pk. If you want to make it messier use UNION.
But the result is the same.

Cheers
Serge
--
Serge Rielau
DB2 Solutions Development
IBM Toronto Lab
Sep 18 '08 #21

P: n/a
On Sep 17, 8:04*pm, Serge Rielau <srie...@ca.ibm.comwrote:
HenryJ. wrote:
On Sep 10, 10:27 am, Serge Rielau <srie...@ca.ibm.comwrote:
>HenryJ. wrote:
On Sep 7, 2:48 pm, Serge Rielau <srie...@ca.ibm.comwrote:
Good. *Another question is, if I send a list of inserts each withtwo
value tuples, like this:
"insert into t values(?,?) (?,?)", instead of a list of "insert into t
values(?,?)",would it speed up the insertion? *Thanks a lot! *This is
really helpful, Serge.
Yes. I did some measurements a while ago (very unscientific) and found
that it makes sense to bundle up to 100 rows. After that it doesn't make
a difference. Of course network latency plays a role (my client was local).
Hope this is my last question in this thread. *Now I know it is faster
to send a batch of 1000
"insert into t values(?,?) (?,?)" than sending 2000 "insert into t
values(?,?)". *Is it equally fast if i send 1000 merge statements like
"merge into t using table (values(?,?) (?,?) ) ..."? *Faster than 2000
"merge into t using table (values(?,?) ) ...", right?
I know I can find out by testing but tips from experts like you guys
save me tons of time ... *Really appreciate it.
MERGE is a special child....
If you use a MERGE straight with a multi-row VALUES DB2 does not know
whether the new rows match to unique rows in the MERGE target.
If DB2 can't rely on that the MERGE statement gets a lot more complex
(and slower). Here is how to do it right:
MERGE INTO T USING (SELECT pk, MAX(c1), MAX(c2) FROM (VALUES (?, ?, ?),
...) AS V(pk, c1, c2)) AS S(pk, c1, c2)
ON T.pk = S.pk
WHEN MATCHED THEN UPDATE SET c1 = S.c1, c2 = S.c2
WHEN NOT MATCHED THEN INSERT VALUES (pk, c1, c2)
Not that in a case of 200 single-row merge teh semantics are different..
You can absolutely have dups since teh MERGEs are commulatative (sp??)..
Cheers
Serge
--
Serge Rielau
DB2 Solutions Development
IBM Toronto Lab- Hide quoted text -
- Show quoted text -
Do you mean that we indeed can achieve similar performance boost using
merge? * I'm not sure if I understand your trick with "select
max ...". *Are you intended to do a "group by"? * Can we use union to
remove dupe as in *"values(?,?,?) union values(?,?,?) union .... "?

Yes, you need a GROUP BY pk. If you want to make it messier use UNION.
But the result is the same.

Cheers
Serge
--
Serge Rielau
DB2 Solutions Development
IBM Toronto Lab
So we need to use "group by" to let the optimizer know that there is
no dupe value?

How about "insert into t values (?,?)(?,?)"? It can also contain dupe
values?

Note that our app will never send dupe values to be inserted or
merged.

Thanks.
Sep 18 '08 #22

P: n/a
Henry J. wrote:
Note that our app will never send dupe values to be inserted or
merged.
DB2 cannot know that. The only way for DB2 to know is GROUP BY the PK.
Note that UNION actually (I thought more about it) does NOT work.
UNION assures uniqueness of the row. That is not enough. You need
uniqueness of the key used in teh MERGe ON clause.

Cheers
Serge

PS: I think we have given you all the bread crumbs you need and I am on
my way to a beach. So run experimenets. Look at the optimizer plan, ...

--
Serge Rielau
DB2 Solutions Development
IBM Toronto Lab
Sep 18 '08 #23

P: n/a
On Sep 18, 9:02*am, Serge Rielau <srie...@ca.ibm.comwrote:
HenryJ. wrote:
Note that our app will never send dupe values to be inserted or
merged.

DB2 cannot know that. The only way for DB2 to know is GROUP BY the PK.
Note that UNION actually (I thought more about it) does NOT work.
UNION assures uniqueness of the row. That is not enough. You need
uniqueness of the key used in teh MERGe ON clause.

Cheers
Serge

PS: I think we have given you all the bread crumbs you need and I am on
my way to a beach. So run experimenets. Look at the optimizer plan, ...

--
Serge Rielau
DB2 Solutions Development
IBM Toronto Lab
Thanks Serge. Really appreciated. I got all I need to know. Have
you written any books? A true expert and kind soul. Wish you great
time on beach.
Sep 18 '08 #24

P: n/a
Henry J. wrote:
Thanks Serge. Really appreciated. I got all I need to know. Have
you written any books? A true expert and kind soul. Wish you great
time on beach.
Hah! It takes a patient soul to write a book. I am many things, but not
that.

Cheers
Serge

--
Serge Rielau
DB2 Solutions Development
IBM Toronto Lab
Sep 18 '08 #25

This discussion thread is closed

Replies have been disabled for this discussion.