By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
438,856 Members | 2,179 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 438,856 IT Pros & Developers. It's quick & easy.

Update Versus Append

P: n/a
I have a database that is being used as sort of a reports data
warehouse. I use DTS packages to upload data from all the different
sources. Right now I have it truncating the tables and appending with
fresh data. I was considering using updates instead and my question was
which is more efficent?

Jul 23 '05 #1
Share this Question
Share on Google+
3 Replies


P: n/a
Ray
Depends on many factors ie. number/size of indexes, fill factors, using bulk
insert or insert, etc. But the most important are how much effort does it
take to identify changed records and what percentage of the data is
changing...
"Vithar" <cj******@cskauto.com> wrote in message
news:11**********************@z14g2000cwz.googlegr oups.com...
I have a database that is being used as sort of a reports data
warehouse. I use DTS packages to upload data from all the different
sources. Right now I have it truncating the tables and appending with
fresh data. I was considering using updates instead and my question was
which is more efficent?

Jul 23 '05 #2

P: n/a


Each table has 1 index and about 50,000 rows being inserted. There are
about 30 tables. I am using a basic insert. It is very hard to determine
which records have changed. The data is coming from a 3rd party
application. We are using this SQL Database as a datawarehouse for
custom reports and adhoc reports. There is a lot of messaging of data to
make it come out in a usuable format. The DTS package runs each night at
midnight but the users want it to run once an hour if possible. The way
I have it designed it takes about 15-20 minutes to run. If I schedule it
every hour and the record count grows I foresee it running almost
constantly so I was looking for a way to refine the process and make it
run faster. I wasn't sure the mechanics behind update versus insert
whether one is more "Costly" to run than the other.

*** Sent via Developersdex http://www.developersdex.com ***
Don't just participate in USENET...get rewarded for it!
Jul 23 '05 #3

P: n/a
Chris:

If you really want this to run hourly then you'll need a delta
calculation. Extracting only delta records does not seem very
practical since your data is comming from a third party source.
Therefore I would suggest using a checksum value against the raw data
(before any massaging) and storing this value in a series of staging
tables (the structure of these staging tables depends on the primary
key of the incoming data). Assuming all incoming data has a single
integer primary key you could create the following staging table:

CREATE TABLE IMPORT_CRC_NEW (
TABLE_NAME VARCHAR(128) NOT NULL
,PK_FIELD INT NOT NULL
,CRC_VALUE INT NOT NULL
)

To prepare a new load you should follow this general theme:
1) rename the IMPORT_CRC_NEW table to IMPORT_CRC_OLD (via sp_rename,
so you don't worry about more inserts)
2) rre-create the IMPORT_CRC_NEW table
3) bulk insert all new data into staging tables with their
individual formatting
4) for each source table insert into the IMPORT_CRC_NEW table with a
query like this:
INSERT INTO dbo.IMPORT_CRC_NEW
SELECT
'MyTableName'
,PK_FIELD
,CHECKSUM(*)
FROM dbo.SOURCE_TABLE_1
5) caclulate the rows that have changed by comparing the
IMPORT_CRC_NEW and IMPORT_CRC_OLD tables and store their primary keys
in another staging table (see below). You will need to capture New,
Updated and Deleted rows. The New and Deleted rows should be obvious
but the updated rows are calculated based on:
IMPORT_CRC_NEW.dbo.PK_FIELD = IMPORT_CRC_OLD.dbo.PK_FIELD
AND IMPORT_CRC_NEW.dbo.CRC_VALUE <>IMPORT_CRC_OLD.dbo.CRC_VALUE
(this can be done in a join or using a where exists clause)

/* Delta Table - Holds primary key values for items to be refreshed.
*/
CREATE TABLE IMPORT_DELTA_QUE (
TABLE_NAME VARCHAR(128) NOT NULL
,PK_FIELD INT NOT NULL
)

You would then perform your data transformations against only the
changed items.

This is a very rough outline of a delta process, hopefully it makes
sense. If not, please drop me an email and I'll try to explain it
further.

ji*********@and1.com

On 29 Dec 2004 11:50:27 -0600, Chris Johnson <cj******@cskauto.com>
wrote:


Each table has 1 index and about 50,000 rows being inserted. There are
about 30 tables. I am using a basic insert. It is very hard to determine
which records have changed. The data is coming from a 3rd party
application. We are using this SQL Database as a datawarehouse for
custom reports and adhoc reports. There is a lot of messaging of data to
make it come out in a usuable format. The DTS package runs each night at
midnight but the users want it to run once an hour if possible. The way
I have it designed it takes about 15-20 minutes to run. If I schedule it
every hour and the record count grows I foresee it running almost
constantly so I was looking for a way to refine the process and make it
run faster. I wasn't sure the mechanics behind update versus insert
whether one is more "Costly" to run than the other.

*** Sent via Developersdex http://www.developersdex.com ***
Don't just participate in USENET...get rewarded for it!


Jul 23 '05 #4

This discussion thread is closed

Replies have been disabled for this discussion.