473,408 Members | 1,826 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,408 software developers and data experts.

Update Versus Append

I have a database that is being used as sort of a reports data
warehouse. I use DTS packages to upload data from all the different
sources. Right now I have it truncating the tables and appending with
fresh data. I was considering using updates instead and my question was
which is more efficent?

Jul 23 '05 #1
3 3816
Ray
Depends on many factors ie. number/size of indexes, fill factors, using bulk
insert or insert, etc. But the most important are how much effort does it
take to identify changed records and what percentage of the data is
changing...
"Vithar" <cj******@cskauto.com> wrote in message
news:11**********************@z14g2000cwz.googlegr oups.com...
I have a database that is being used as sort of a reports data
warehouse. I use DTS packages to upload data from all the different
sources. Right now I have it truncating the tables and appending with
fresh data. I was considering using updates instead and my question was
which is more efficent?

Jul 23 '05 #2


Each table has 1 index and about 50,000 rows being inserted. There are
about 30 tables. I am using a basic insert. It is very hard to determine
which records have changed. The data is coming from a 3rd party
application. We are using this SQL Database as a datawarehouse for
custom reports and adhoc reports. There is a lot of messaging of data to
make it come out in a usuable format. The DTS package runs each night at
midnight but the users want it to run once an hour if possible. The way
I have it designed it takes about 15-20 minutes to run. If I schedule it
every hour and the record count grows I foresee it running almost
constantly so I was looking for a way to refine the process and make it
run faster. I wasn't sure the mechanics behind update versus insert
whether one is more "Costly" to run than the other.

*** Sent via Developersdex http://www.developersdex.com ***
Don't just participate in USENET...get rewarded for it!
Jul 23 '05 #3
Chris:

If you really want this to run hourly then you'll need a delta
calculation. Extracting only delta records does not seem very
practical since your data is comming from a third party source.
Therefore I would suggest using a checksum value against the raw data
(before any massaging) and storing this value in a series of staging
tables (the structure of these staging tables depends on the primary
key of the incoming data). Assuming all incoming data has a single
integer primary key you could create the following staging table:

CREATE TABLE IMPORT_CRC_NEW (
TABLE_NAME VARCHAR(128) NOT NULL
,PK_FIELD INT NOT NULL
,CRC_VALUE INT NOT NULL
)

To prepare a new load you should follow this general theme:
1) rename the IMPORT_CRC_NEW table to IMPORT_CRC_OLD (via sp_rename,
so you don't worry about more inserts)
2) rre-create the IMPORT_CRC_NEW table
3) bulk insert all new data into staging tables with their
individual formatting
4) for each source table insert into the IMPORT_CRC_NEW table with a
query like this:
INSERT INTO dbo.IMPORT_CRC_NEW
SELECT
'MyTableName'
,PK_FIELD
,CHECKSUM(*)
FROM dbo.SOURCE_TABLE_1
5) caclulate the rows that have changed by comparing the
IMPORT_CRC_NEW and IMPORT_CRC_OLD tables and store their primary keys
in another staging table (see below). You will need to capture New,
Updated and Deleted rows. The New and Deleted rows should be obvious
but the updated rows are calculated based on:
IMPORT_CRC_NEW.dbo.PK_FIELD = IMPORT_CRC_OLD.dbo.PK_FIELD
AND IMPORT_CRC_NEW.dbo.CRC_VALUE <>IMPORT_CRC_OLD.dbo.CRC_VALUE
(this can be done in a join or using a where exists clause)

/* Delta Table - Holds primary key values for items to be refreshed.
*/
CREATE TABLE IMPORT_DELTA_QUE (
TABLE_NAME VARCHAR(128) NOT NULL
,PK_FIELD INT NOT NULL
)

You would then perform your data transformations against only the
changed items.

This is a very rough outline of a delta process, hopefully it makes
sense. If not, please drop me an email and I'll try to explain it
further.

ji*********@and1.com

On 29 Dec 2004 11:50:27 -0600, Chris Johnson <cj******@cskauto.com>
wrote:


Each table has 1 index and about 50,000 rows being inserted. There are
about 30 tables. I am using a basic insert. It is very hard to determine
which records have changed. The data is coming from a 3rd party
application. We are using this SQL Database as a datawarehouse for
custom reports and adhoc reports. There is a lot of messaging of data to
make it come out in a usuable format. The DTS package runs each night at
midnight but the users want it to run once an hour if possible. The way
I have it designed it takes about 15-20 minutes to run. If I schedule it
every hour and the record count grows I foresee it running almost
constantly so I was looking for a way to refine the process and make it
run faster. I wasn't sure the mechanics behind update versus insert
whether one is more "Costly" to run than the other.

*** Sent via Developersdex http://www.developersdex.com ***
Don't just participate in USENET...get rewarded for it!


Jul 23 '05 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: JMCN | last post by:
hi i need some advice on whether if it would be better to use an append query or an update query. here is the situation, i have linked another database table to my current database. then i...
1
by: Aaron | last post by:
Hello fellow programmers, I am trying to run an append/update query from code, a command button on a form initiates the queries. the format i am using is; ...
1
by: Russ | last post by:
I've been trying to get my head around this for 3 days now and it seems like everything I try does not work for one reason or another. I built a test page using the TabStrip and MultiPage controls....
1
by: BW | last post by:
Hello, I am attempting to to take edited values from a datagrid control and update the underlying database. Problem is that some of the edited values don quite seem to make it to my update sub....
4
by: MN | last post by:
I have to import a tab-delimited text file daily into Access through a macro. All of the data needs to be added to an existing table. Some of the data already exists but may be updated by the...
16
by: ARC | last post by:
Hello all, So I'm knee deep in this import utility program, and am coming up with all sorts of "gotcha's!". 1st off. On a "Find Duplicates Query", does anyone have a good solution for...
0
by: riyap | last post by:
Hi, can anyone guide me how to update a access db with datatables and string builder im trying to update 3 columns does sb.Append(drRel.Item("id").ToString() & ",") this means im i updating...
0
by: riyap | last post by:
Hi i have a question regarding a update in msaccess DB using string builder,DATA SET AND DATA RELATIONS can we do that in access DB i have a table in access i need to pass more than 1 record and...
4
by: dougmeece | last post by:
Morning Everyone... I have a table that needs to be append to and also updated. All the fields in the table are populated with data from the text boxes and combo boxes on a form. The Date...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.