469,310 Members | 2,750 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,310 developers. It's quick & easy.

multiple insertions through jdbc

Hi everybody!

I'm using DB2 PE v8.2.3 for linux. I've defined a database with the
following schema:

ANNOTATION(ID,AUTHOR,TEXT)
ANNOTATION_BOOK(ANNOTATION_ID,OBJECT_ID)
BOOK(ID,AUTHOR,TITLE).

Between the book and annotation entities there is a many-to-many
relation: one annotation can involve many books and a book can be
annotated by many annotations. So the attributes ANNOTATION_ID and
OBJECT_ID of the ANNOTATION_BOOK table are foreign keys referring to the
ANNOTATION and BOOK tables ids, respectively. I'm using two sequences
for the attributes ID of tables ANNOTATION and BOOK: annotation_id_seq
and book_id_seq.

I've to make batch insertions in this database that are the result of an
algorithm. The algorithm output is a set made up of thousands of
annotations involving a small number of books. These books might
already been stored in the database and obviously I don't want to have
multiple tuples that represent the same book, i.e., have the same values
for the attributes AUTHOR and TITLE.

The result of the algorithm has to be stored in the database through
JDBC. I'm using the driver in the packages: db2jcc.jar and db2java.jar.
My current solution is:

1. Query the table BOOK to determine the ids of the books that are
already persistent.

2. Inserting the book annotations in sequence in a batch. More
specifically:
2.1 Creating an instance of object java.sql.Statement, let's call it
statement.
2.2 Adding an sql insert command for table ANNOTATION to the statement:

insert into ANNOTATION(ID,AUTHOR,TEXT)
values(next value for annotation_id_seq,...).

2.3 For each book involved in the book annotation:
2.3.1(a) If the book is not already persistent add two sql insert
command for table BOOK and table ANNOTATION_BOOK to the statement:

insert into BOOK(ID,AUTHOR,TITLE)
values(next value for public.book_id_seq,...)

insert into ANNOTATION_BOOK(ANNOTATION_ID,BOOK_ID)
values(previous value for annotation_id_seq,
previous value for book_id_seq);

2.3.1(b) If the book annotation involves a book that is already
persistent add one sql insert command for table ANNOTATION_BOOK with the
value that you retrieved at point 1 for the attribute BOOK_ID:

insert into ANNOTATION_BOOK(ANNOTATION_ID,BOOK_ID)
values(previous value for annotation_id_seq,
<value_retrieved_at_point_1>);
2.4 Execute the batch:

statement.executeBatch();
This solution is not very efficient. I would like to improve the
insertion performance. Maybe after retrieving the ids of the persistent
books (point 1), I could ask the dbms to increase the book and
annotation sequence by how much I need and then insert the tuples one
table at the time, maybe using a prepared statement instead of a statement.

Do you have any suggestion?

Thank you very much,

Diego
Feb 15 '06 #1
2 2729
The nature of this question and the use of PE indicates this could be a
class problem.

The key to improving performance here is to minimize the overhead.
1. Use parameter markers and multiple statements.
2. Figure out a way to make the database manager do some of the work for
you.
3. Consider inserting multiple ANNOTATION_BOOK rows with each statement.

You'll have problems with your 1. because you've made no allowance for
the books you'll be adding to the table. You've also said nothing about
how you will be determining the id for the persistant book associated
with the current annotation. You've also said nothing about determining
that an annotation you are processing already exists in the annotation
table. (It's possible that all annotations are considered new and get a
new row in their table.)

Is a sequence appropriate? Would identity columns for BOOK and
ANNOTATION be an alternative?
Phil Sherman

Diego wrote:
Hi everybody!

I'm using DB2 PE v8.2.3 for linux. I've defined a database with the
following schema:

ANNOTATION(ID,AUTHOR,TEXT)
ANNOTATION_BOOK(ANNOTATION_ID,OBJECT_ID)
BOOK(ID,AUTHOR,TITLE).

Between the book and annotation entities there is a many-to-many
relation: one annotation can involve many books and a book can be
annotated by many annotations. So the attributes ANNOTATION_ID and
OBJECT_ID of the ANNOTATION_BOOK table are foreign keys referring to the
ANNOTATION and BOOK tables ids, respectively. I'm using two sequences
for the attributes ID of tables ANNOTATION and BOOK: annotation_id_seq
and book_id_seq.

I've to make batch insertions in this database that are the result of an
algorithm. The algorithm output is a set made up of thousands of
annotations involving a small number of books. These books might
already been stored in the database and obviously I don't want to have
multiple tuples that represent the same book, i.e., have the same values
for the attributes AUTHOR and TITLE.

The result of the algorithm has to be stored in the database through
JDBC. I'm using the driver in the packages: db2jcc.jar and db2java.jar.
My current solution is:

1. Query the table BOOK to determine the ids of the books that are
already persistent.

2. Inserting the book annotations in sequence in a batch. More
specifically:
2.1 Creating an instance of object java.sql.Statement, let's call it
statement.
2.2 Adding an sql insert command for table ANNOTATION to the
statement:

insert into ANNOTATION(ID,AUTHOR,TEXT)
values(next value for annotation_id_seq,...).

2.3 For each book involved in the book annotation:
2.3.1(a) If the book is not already persistent add two sql
insert command for table BOOK and table ANNOTATION_BOOK to the statement:

insert into BOOK(ID,AUTHOR,TITLE)
values(next value for public.book_id_seq,...)

insert into ANNOTATION_BOOK(ANNOTATION_ID,BOOK_ID)
values(previous value for annotation_id_seq,
previous value for book_id_seq);

2.3.1(b) If the book annotation involves a book that is already
persistent add one sql insert command for table ANNOTATION_BOOK with the
value that you retrieved at point 1 for the attribute BOOK_ID:

insert into ANNOTATION_BOOK(ANNOTATION_ID,BOOK_ID)
values(previous value for annotation_id_seq,
<value_retrieved_at_point_1>);
2.4 Execute the batch:

statement.executeBatch();
This solution is not very efficient. I would like to improve the
insertion performance. Maybe after retrieving the ids of the persistent
books (point 1), I could ask the dbms to increase the book and
annotation sequence by how much I need and then insert the tuples one
table at the time, maybe using a prepared statement instead of a statement.

Do you have any suggestion?

Thank you very much,

Diego

Feb 17 '06 #2
Hi Phil,

I've made some modifications to this procedure that go in the direction
you were suggesting in your points 1,2 and 3 and I had a good improvement.
You've also said nothing about how you will be determining the id for the persistant book associated with the current annotation.
I assume that the pair of attributes (AUTHOR, TITLE) of the relation
BOOK is unique (actually the db is more complex than that but this was a
significant simplification). So I query BOOK with two values for this
pair of attributes, e.g.:

select *
from BOOK " +
where AUTHOR = ? and TITLE = ?
You've also said nothing about determining that an annotation you are processing already exists in the annotation table. (It's possible that all annotations are considered new and get a new row in their table.)
Yes, that's exactly the case. All annotation are considered new and get
a new row.
Is a sequence appropriate? Would identity columns for BOOK and ANNOTATION be an alternative?


In the new procedure I cannot use identity columns.
The new procedure works as follows:

1. Query the table BOOK to determine the ids of the books that are
already persistent. To do that I execute a PreparedStatement for each
book:

String query = "select * " +
"from public.BOOK " +
"where AUTHOR = ? and TITLE = ?";
preparedStatement = connection.prepareStatement(query);
for each book
preparedStatement.setXXX(1,...);
preparedStatement.setXXX(2,...);
resultSet = preparedStatement.executeQuery();

As you suggested in point 1, I tried to execute a multiple statement
(using addBatch) with a preparedStatement but it is not supported with a
SELECT.

2. Request as many ids as needed for the annotations from the sequences
that manages the ANNOTATION table id:

preparedStatement = connection.prepareStatement("values nextval for
public.annotation_id_seq");
for each annotation
ResultSet resultSet = preparedStatement.getResultSet();

3. Request as many ids as needed for the books.

4. Inserts tuple in ANNOTATION table.
5. Inserts tuple in BOOK_ANNOTATION table.
6. Insert tuples in BOOK table.

Points 4,5 and 6 are implemented through a prepared statement batch
(what you called in your point "multiple statement", if I got it right).
As an example, consider point 4:

String sql = "insert into public.annotation(id,author,text) " +
"values(?,?,?);";
preparedStatement = connection.prepareStatement(sql);
for each annotation
preparedStatement.setXXX(1,...);
preparedStatement.setXXX(2,...);
preparedStatement.setXXX(3,...);
preparedStatement.addBatch();

This is analogous to what I intended to do for point 1. But apparently
multiple sql commands with prepared statements don't work with SELECTs.

Thanks a lot!

Diego
Feb 20 '06 #3

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

reply views Thread by Nils Valentin | last post: by
1 post views Thread by Sergio | last post: by
reply views Thread by Matthew Lunnon | last post: by
3 posts views Thread by Anoop | last post: by
9 posts views Thread by mak1084 | last post: by
1 post views Thread by CARIGAR | last post: by
reply views Thread by zhoujie | last post: by
reply views Thread by suresh191 | last post: by
reply views Thread by harlem98 | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.