470,614 Members | 1,578 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 470,614 developers. It's quick & easy.

SQL Server for data processing

I'm working on a system right now where I have a database (two,
actually, but one is discarded halfway through), but it's created
and used as part of a process (reporting), rather than as the
actual production data repository. I may be keeping the database
permanantly, but it would be completely read-only; once the
process is complete, the database will not change again. This has
me wanting to do a few things that are rather foreign to my usual
experience, and I don't know how many of them are supported.

In several cases, I'm summarizing one table into another by several
fields, and then updating the original table with an ID for the
summary row each source row was summarized into (e.g., I summarize
PlaceAndProductSummary into PlaceSummary, and then populate
PlaceSummaryID in PlaceAndProductSummary). The update of the
source table is much faster if the summary table has a clustered
index on the summarized fields, but all later access will be faster
if the clustered index is on the identity column. I've been
including an ORDER BY the summarized fields in the original insert,
so the identity column is in the same order as the summarized fields,
but I don't know of any way to take advantage of that in the
indexing declarations.

As another approach to the above situation, if I change the
clustered index on a table, and the rows happen to be in the
same order by both indexes, will the table still get rebuilt?

I will never do a roll-back in the process; if an action fails, I
want to raise an error and halt (and I haven't lost any data).
Is there any way to completely turn off logging?

Will I gain anything by marking the database as single-user?

Any indexes that I am not using while I populate the tables, I'm
adding at the end with FillFactor 100, to keep any slack out.
Is there a way to remove all the slack from everything else, at
the end of the process? During a backup operation would be fine.

Thanks,
Bill

Jul 23 '05 #1
2 1301
On Mon, 07 Feb 2005 04:54:43 GMT, William Cleveland wrote:

(snip)
In several cases, I'm summarizing one table into another by several
fields, and then updating the original table with an ID for the
summary row each source row was summarized into (e.g., I summarize
PlaceAndProductSummary into PlaceSummary, and then populate
PlaceSummaryID in PlaceAndProductSummary). The update of the
source table is much faster if the summary table has a clustered
index on the summarized fields, but all later access will be faster
if the clustered index is on the identity column. I've been
including an ORDER BY the summarized fields in the original insert,
so the identity column is in the same order as the summarized fields,
but I don't know of any way to take advantage of that in the
indexing declarations.
Hi William,

First, you should know that including ORDER BY in an INSERT .. SELECT
statement in meaningless. There is no way to guarantee that the identity
values get handed out in the order you specify. If you really must have
them in a specific order, the only safe way is to use a cursor - but since
IDENTITY is meant to be a **meaningless** unique code, this is a quite
questionable approach. Especially since there are more caveats to using an
identity for this goal. If the generated value has to have some logical
relation to the data, you'll have to calculate it, either on insert or on
retrieval.

I don't really understand the rest of this paragraph, though. It appears
that you are merely looking for maximum speed. If that's the case, there
might be other options to consider. Maybe, the update of the source table
will be equally fast (or even faster) if you use a covering nonclustered
index?

If you could show some more specifics about what exactly you are trying to
do, I might be able to give more specific comments.

As another approach to the above situation, if I change the
clustered index on a table, and the rows happen to be in the
same order by both indexes, will the table still get rebuilt?
My guess is that it is - it should! First because SQL Server doesn't know
that the rows are in order, so it will have to be checked anyway. Second
because a clustered index is more than just the table - the table is in
the leaf pages of the index; the root and the intermediate pages have to
be built as well.

To verify, I ran this code with the Show Execution Plan option turned on
(Ctrl-K in Query Analyzer). Both with or without the added nonclustered
index, the rebuilding of the clustered index had exactly the same plan as
the building of the first clustered index.

create table testit (a int not null, b int not null)
go
create unique clustered index i1 on testit(a)
-- create unique nonclustered index i2 on testit(b)
go
insert testit values (1, 1)
insert testit values (2, 2)
insert testit values (3, 3)
go
create unique clustered index i1 on testit(b) with drop_existing
go
drop table testit
go

I will never do a roll-back in the process; if an action fails, I
want to raise an error and halt (and I haven't lost any data).
Is there any way to completely turn off logging?
No, there is no way to turn off logging. It's not only used for rollback,
it's also used for trigger execution, for recovery after a system crash
and for backup and restore - and then I'm probably forgetting some.

Besides, *you* might never do a rollback, but for some errors, SQL Server
will issue a rollback.

Will I gain anything by marking the database as single-user?
Yes: you'll save the overhead of taking and releasing locks. The price you
pay is that you can't run multiple connections at once. Without the
single-user setting, you could still inspect some data in table A while a
long running query is updating table B. This is impossible in single-user
mode (in fact, it should have been called single-connection mode).

Any indexes that I am not using while I populate the tables, I'm
adding at the end with FillFactor 100, to keep any slack out.
Is there a way to remove all the slack from everything else, at
the end of the process? During a backup operation would be fine.


The only way I know is to rebuild all indexes with fillfactor 100.

Best, Hugo
--

(Remove _NO_ and _SPAM_ to get my e-mail address)
Jul 23 '05 #2
Hugo Kornelis (hugo@pe_NO_rFact.in_SPAM_fo) writes:
First, you should know that including ORDER BY in an INSERT .. SELECT
statement in meaningless. There is no way to guarantee that the identity
values get handed out in the order you specify.


That is definitely not the same as saying that ORDER BY is meaningless.

If you say:

INSERT tbl (...)
SELECT ....
ORDER BY
OPTION (MAXDOP 1)

there is no guarantee that your IDENTITY values match - but your odds are
pretty good.

--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinf...2000/books.asp
Jul 23 '05 #3

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

26 posts views Thread by David W. Fenton | last post: by
5 posts views Thread by Ian | last post: by
1 post views Thread by gsalbertson | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.