By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
438,589 Members | 2,255 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 438,589 IT Pros & Developers. It's quick & easy.

LOAD vs. IMPORT

P: n/a

Hi.

Can any of you explain the major differences in LOAD and IMPORT in
laymen terms?

I've read the DB2 docs: "[IMPORT] Inserts data from an external file
with a supported file format into a table, hierarchy, or view. A faster
alternative is LOAD; however, the load utility does not support loading
data at the hierarchy level."

What does "loading data at the hierarchy level" imply? I have 5 tables,
no (enforced) referential constraints. The biggest table has 500.000
rows and if possible I would like to avoid locking the table for long
(rather return an empty result set that have user wait).

Given my situation, what are the pros and cons of the two?

Thanks.

Morten

Nov 12 '05 #1
Share this Question
Share on Google+
13 Replies


P: n/a
use...@kikobu.com wrote:
Hi.

Can any of you explain the major differences in LOAD and IMPORT in
laymen terms?


The biggest difference is that import is a logged operation - similar
to doing your own inserts, while load bypasses the logs and adds the
data directly to the table. I think the issue about hierarchies is
regarding the implementation of sub-types through db2. Doesn't sound
like this is what you're doing.

Pros of each for your situation would probably look like:

import
- pros: best solution for concurrency & recoverability.
- cons: slower of the two solutions, and could result in quality errors
to queries while it is running - due to partial data caused by multiple
commits.

load
- pros: fastest solution
- cons: poor solution for recoverability in your situation, and
concurrency limitations (note though when used with "ALLOW READ ACCESS"
concurrency is usually fine).

Another question might be whether your concurrent queries will return
erroneous results if you load or import just a single table at a
time....
Ken

Nov 12 '05 #2

P: n/a
us****@kikobu.com wrote:
Hi.

Can any of you explain the major differences in LOAD and IMPORT in
laymen terms?

I've read the DB2 docs: "[IMPORT] Inserts data from an external file
with a supported file format into a table, hierarchy, or view. A faster
alternative is LOAD; however, the load utility does not support loading
data at the hierarchy level."

What does "loading data at the hierarchy level" imply? I have 5 tables,
no (enforced) referential constraints. The biggest table has 500.000
rows and if possible I would like to avoid locking the table for long
(rather return an empty result set that have user wait).

Given my situation, what are the pros and cons of the two?

Thanks.

Morten

The "hierarchy level" refers to typed table hierarchies. Nothing you
have to worry about.

Cheers
Serge
--
Serge Rielau
DB2 SQL Compiler Development
IBM Toronto Lab
Nov 12 '05 #3

P: n/a
"kenfar" <ke****@gmail.com> wrote in message
news:11**********************@g44g2000cwa.googlegr oups.com...
The biggest difference is that import is a logged operation - similar
to doing your own inserts, while load bypasses the logs and adds the
data directly to the table. I think the issue about hierarchies is
regarding the implementation of sub-types through db2. Doesn't sound
like this is what you're doing.

Pros of each for your situation would probably look like:

import
- pros: best solution for concurrency & recoverability.
- cons: slower of the two solutions, and could result in quality errors
to queries while it is running - due to partial data caused by multiple
commits.

load
- pros: fastest solution
- cons: poor solution for recoverability in your situation, and
concurrency limitations (note though when used with "ALLOW READ ACCESS"
concurrency is usually fine).

Another question might be whether your concurrent queries will return
erroneous results if you load or import just a single table at a
time....

Ken


Keep in mind that since import does inserts, it could fire triggers defined
on the tables. Load will not fire any triggers.

If doing an import, you should use the commitcount parm to do a commit every
1000 rows or so if you have a large amount of data to import. This will keep
the logs from filling up.
Nov 12 '05 #4

P: n/a
In article <11**********************@g44g2000cwa.googlegroups .com>, "kenfar" <ke****@gmail.com> writes:
use...@kikobu.com wrote:
Hi.

Can any of you explain the major differences in LOAD and IMPORT in
laymen terms?


The biggest difference is that import is a logged operation - similar
to doing your own inserts, while load bypasses the logs and adds the
data directly to the table. I think the issue about hierarchies is
regarding the implementation of sub-types through db2. Doesn't sound
like this is what you're doing.

Pros of each for your situation would probably look like:

import
- pros: best solution for concurrency & recoverability.
- cons: slower of the two solutions, and could result in quality errors
to queries while it is running - due to partial data caused by multiple
commits.

load
- pros: fastest solution
- cons: poor solution for recoverability in your situation, and
concurrency limitations (note though when used with "ALLOW READ ACCESS"
concurrency is usually fine).

Another question might be whether your concurrent queries will return
erroneous results if you load or import just a single table at a
time....
Ken


A big con for Load that we ran into was:
- load would leave the tablespace in 'backup pending' mode, thus requiring
a backup to continue to use the table
- you could use the NONRECOVERABLE option to avoid this, but then you
couldn't rollforward any journals after the load.
- load also needed to acquire a super exclusive lock on the table

Once we went production with our warehouse, we had to convert all of our
LOAD scripts to use IMPORT.

This is LUW, 8.1.6 I don't know if 8.2 has the same restrictions...

Doug

Nov 12 '05 #5

P: n/a
Ian
Doug Crowson wrote:

A big con for Load that we ran into was:
- load would leave the tablespace in 'backup pending' mode, thus requiring
a backup to continue to use the table
- you could use the NONRECOVERABLE option to avoid this, but then you
couldn't rollforward any journals after the load.
You need to use COPY YES in order to perform a recoverable load and
avoid placing the tablespace into backup pending state.
- load also needed to acquire a super exclusive lock on the table


Yes, but if you are doing an online load, the Z-lock on the table is
only held for a short time.

Nov 12 '05 #6

P: n/a
>> A big con for Load that we ran into was:
- load would leave the tablespace in 'backup pending' mode, thus requiring
a backup to continue to use the table
- you could use the NONRECOVERABLE option to avoid this, but then you
couldn't rollforward any journals after the load. You need to use COPY YES in order to perform a recoverable load and
avoid placing the tablespace into backup pending state.


Yeah, i try very hard to avoid using load on a transactional database.
In the case of my warehouse, it is non-transactional - and the load
files are the backups: if a recovery is required we move the compressed
files from archive to input, and the loader takes care of it.
Simplifies most things.
- load also needed to acquire a

- load also needed to acquire a super exclusive lock on the table

Yes, but if you are doing an online load, the Z-lock on the table is
only held for a short time.


yeah, i've only really had problems with load allowing read access when
the server was getting hit by a massive barrage of like 60,000 queries
- to drive canned reports. Other than that, a long lockwaittime of 120
seconds or so (assuming average query duration of 5 seconds) has worked
fine.

One other thing I forgot to mention: the insert_update import option
is really handy, and i'm often now using it for smaller volume (<
100,000 row) table ETL operations. Especially when concurrency is
tricky.

Nov 12 '05 #7

P: n/a
Ian
kenfar wrote:

One other thing I forgot to mention: the insert_update import option
is really handy, and i'm often now using it for smaller volume (<
100,000 row) table ETL operations. Especially when concurrency is
tricky.


Obviously you've been around the block a few times, but from a
concurrency perspective, you do realize that IMPORT (by default)
takes an exclusive lock on the table it is writing to, right?
This was a MAJOR concurrency issue; I had to write a replacement
for the import utility to avoid this in V7.2.

One of the V8 fixpacks (finally!) allowed you to work around this
requirement, and FP9 added the 'ALLOW WRITE ACCESS' option to IMPORT.

Nov 12 '05 #8

P: n/a
Thanks for all the input, it has been really insightful. I've decided
to go with LOAD first (COPY YES), and see how long the job takes. There
will be hardly any users on the system at the time I do the load, so if
I can do it below 1 minute, it's okay.

If this does not work, I'll wipe the table first (empty LOAD) and then
do an IMPORT. As mentioned, responsiveness is more of an issue than the
user seeing wrong/missing data for a few seconds.

This job is the only that ever writes to the tables.

Thanks.

Morten

Nov 12 '05 #9

P: n/a

"... IMPORT (by default)
takes an exclusive lock on the table it is writing to" - if I do an
IMPORT with REPLACE option, and someone tries to SELECT from the table
while I'm importing, their SELECT will wait until..? Next commit from
the IMPORT or until the job is done?

Thanks.

Morten

Nov 12 '05 #10

P: n/a
us****@kikobu.com wrote:
"... IMPORT (by default)
takes an exclusive lock on the table it is writing to" - if I do an
IMPORT with REPLACE option, and someone tries to SELECT from the table
while I'm importing, their SELECT will wait until..? Next commit from
the IMPORT or until the job is done?


Not positive on this one - but I don't think this will work for you: I
assume that the data previously in the table is completely wiped out by
the time of the first commit. So, you would likely want to have this
operation be all or nothing via load or by only commiting once.

An alternative in this scenario might be the insert_update option -
which will only create row locks if you use allow write access (thanks
Ian for the reminder). Of course, that won't delete data.

ken

Nov 12 '05 #11

P: n/a
Thanks for the input. If the REPLACE option is used, IMPORT will
truncate the table when it starts (as far as I can read the docs). The
question is now, whether IMPORT locks the table during the entire
process, or it allows pending read processes to run in between commits.
I need to delete the existing data (thus the REPLACE option), so I'm
afraid an "all or nothing" scenario is very hard to make using
LOAD/IMPORT.

Nov 12 '05 #12

P: n/a
Welp.. Import does not allow me to read in between commits. But it
suprises me that it is so extremely slow. I have written a client in
Java which using JDBC is very much faster (20-50 times) and that really
surprises me. The Java client uses batching (100 inserts sent over the
wire at a time, then commit) and stored procedures, the IMPORT job
commits after each 500 rows. I guess the IMPORT job sends one insert at
the time across the wire.

Nov 12 '05 #13

P: n/a
Ian
us****@kikobu.com wrote:
Welp.. Import does not allow me to read in between commits. But it
suprises me that it is so extremely slow. I have written a client in
Java which using JDBC is very much faster (20-50 times) and that really
surprises me. The Java client uses batching (100 inserts sent over the
wire at a time, then commit) and stored procedures, the IMPORT job
commits after each 500 rows. I guess the IMPORT job sends one insert at
the time across the wire.


Yes, by default IMPORT does 1 row per insert statement. You can say,

import ... of del MODIFIED BY COMPOUND=x ...

(I think 0 < x <= 100)

This allows multiple rows to be written in each INSERT statement and can
often result in better performance.
Nov 12 '05 #14

This discussion thread is closed

Replies have been disabled for this discussion.