472,096 Members | 1,390 Online
Bytes | Software Development & Data Engineering Community
Post +

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,096 software developers and data experts.

Bulk inserts to Data Warehouse - Best Practices?

Hello all,

I just started a new job this week and they complain about the length of
time it takes to load data into their data warehouse,
which they do once a month.

From what I can gather, they rebuild the indexes before the insert with an
80% Fillfactor, then insert the data (with the
indexes enabled), then rebuild the indexes with a 100% Fillfactor.

Most of my RDBMS experience is with a different product. We would have
disabled the indexes and Foreign Keys, loaded the data, then
re-enabled them, moving any records that violated the constraints into an
appropriate audit table to be checked after.

Can someone share with me what the accepted "best practices" are for loading
data efficiently into a data warehouse?

Any thoughts would be deeply appreciated.

Steve
Jul 20 '05 #1
2 2811
In article <kZ********************@news20.bellglobal.com>,
st********@yahoo.com says...
Hello all,

I just started a new job this week and they complain about the length of
time it takes to load data into their data warehouse,
which they do once a month.

From what I can gather, they rebuild the indexes before the insert with an
80% Fillfactor, then insert the data (with the
indexes enabled), then rebuild the indexes with a 100% Fillfactor.

Most of my RDBMS experience is with a different product. We would have
disabled the indexes and Foreign Keys, loaded the data, then
re-enabled them, moving any records that violated the constraints into an
appropriate audit table to be checked after.

Can someone share with me what the accepted "best practices" are for loading
data efficiently into a data warehouse?

Any thoughts would be deeply appreciated.


Your method, dropping indexes, unless part of a constraint, would be
proper. You could BCP the data into a flat table and then process it
from the flat table to the destination table after you clean it in the
flat table.

--
--
sp*********@rrohio.com
(Remove 999 to reply to me)
Jul 20 '05 #2
Here's a link to a DTS/BI best practices doc:
http://msdn.microsoft.com/library/de...tbpwithdts.asp

Every situation is different, with a lot depending on how much data is being
loaded and what (if any) indexes need to be in place to support the ETL
process. It seems a bit odd to rebuild indexes twice, though. Normally,
one drops all but the required indexes beforehand and recreates the other
(reporting) indexes afterward.

--
Hope this helps.

Dan Guzman
SQL Server MVP

"Steve_CA" <st********@yahoo.com> wrote in message
news:kZ********************@news20.bellglobal.com. ..
Hello all,

I just started a new job this week and they complain about the length of
time it takes to load data into their data warehouse,
which they do once a month.

From what I can gather, they rebuild the indexes before the insert with an
80% Fillfactor, then insert the data (with the
indexes enabled), then rebuild the indexes with a 100% Fillfactor.

Most of my RDBMS experience is with a different product. We would have
disabled the indexes and Foreign Keys, loaded the data, then
re-enabled them, moving any records that violated the constraints into an
appropriate audit table to be checked after.

Can someone share with me what the accepted "best practices" are for
loading
data efficiently into a data warehouse?

Any thoughts would be deeply appreciated.

Steve

Jul 20 '05 #3

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

16 posts views Thread by D Witherspoon | last post: by
2 posts views Thread by Niraj | last post: by

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.