472,145 Members | 1,757 Online
Bytes | Software Development & Data Engineering Community
Post +

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,145 software developers and data experts.

Transform/transfer 50Gb - how to do it fast?

Hello!
I have an big table with 50Gb data, written som functions
that cleanup data and want to do something like this

insert into newtable
select id, func1(col1), func2(col2)
from oldtable;

I'm also plan to make newtable partioned (before insert).

But how could i get the insert as fast as possible?
Greetings
Bjorn D. Jensen

Apr 28 '07 #1
18 3113
Hi,

The quickest way to do this is to use INSERT INTO...SELECT FROM as it
is a non-logged operation

regards,

Malc

B D Jensen wrote:
Hello!
I have an big table with 50Gb data, written som functions
that cleanup data and want to do something like this

insert into newtable
select id, func1(col1), func2(col2)
from oldtable;

I'm also plan to make newtable partioned (before insert).

But how could i get the insert as fast as possible?
Greetings
Bjorn D. Jensen
Apr 28 '07 #2
B D Jensen (bj************@gmail.com) writes:
I have an big table with 50Gb data, written som functions
that cleanup data and want to do something like this

insert into newtable
select id, func1(col1), func2(col2)
from oldtable;

I'm also plan to make newtable partioned (before insert).

But how could i get the insert as fast as possible?
Exactly what is in those functions? Do they perform data access? Are
they written in T-SQL or in the CLR? I ask, because they could have a
great impact on performance.

Apart from that, there are a couple of possible strategies for this
situation. One is SELECT INTO, but since you plan to make the new
table partitioned, I don't think SELECT INTO is good for this. (SELECT
INTO creates a new table.)

Another is to use BCP to first unload the table to a file. You would
then use queryout or a view with your functions, so what get on file
is the cleaned-up version. Then you use BCP to load the data into the
new table. The key here is that there should be no indexes on the table
and it should be empty. In this case the bulk-load is minimally logged.
Of course, you also need to account for the time it takes to create
the indexes.

And the final option is to use a plain INSERT. But a single INSERT
statement will not be good for your transaction log. It's better to
batch and insert, say, 100000 rows at a time, preferrably with the database
set to simple recovery. You should batch on the clustered index of
the old table:

SELECT @start = 1
WHILE EXISTS (SELECT * FROM oldtable WHERE clustercol >= @start)
BEGIN
INSERT newtable (...)
SELECT ...
FROM oldtable
WHERE clustercol >= @start AND clustercol < @start - 100000
SELECT @start = @start + 100000
END

Here the actual increment would depend on the nature of your clustered
key. If it's a date, maybe taking one month at a time is a good idea.

If new the table will have the same clustered index as the old table,
have the clustered index in place when run the above, but wait with
adding non-clustered indexes until you have the data in place.

--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pro...ads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinf...ons/books.mspx
Apr 28 '07 #3
Mork69 (ml****@bigfoot.com) writes:
The quickest way to do this is to use INSERT INTO...SELECT FROM as it
is a non-logged operation
This is not correct. INSERT SELECT FROM is a fully-logged operation. You are
thinking of SELECT INTO which is a minimally logged operation. That is,
all that is logged are the extent allocations. There are no write operations
in SQL Server that are entirely non-logged.
--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pro...ads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinf...ons/books.mspx
Apr 28 '07 #4
>
Exactly what is in those functions? Do they perform data access? Are
they written in T-SQL or in the CLR? I ask, because they could have a
great impact on performance.
Hello Erland!
Functions are written in T-SQL (i also wrote them in CLR, but in this
case they were slower). The original columns have incorrect datatypes,
that uses too much storage,
so the functions check that values are in correct domain and if not
they return null
- what is a correct result, because the values then are physical
impossible.

I wondered why you only wrote that I can't use "select into" for
patitioned tables.
I also expected I must create table first (and then not being able to
use 'select into')
because of the new datatypes - but of course I could write:
select id, cast(func1(col1) as <datatype>) into newtbl from oldtbl

Doing this multiple times with the appropiate where-clause
followed by partion switches maybe will be the solution; I'll
investigate...

.... but if you have some comments let me hear ;-)
Greetings
Bjorn D. Jensen
bj************@gmail.com
P.S. I already know your website, thanks for all that good
information!

Apr 29 '07 #5
B D Jensen (bj************@gmail.com) writes:
Functions are written in T-SQL (i also wrote them in CLR, but in this
case they were slower). The original columns have incorrect datatypes,
that uses too much storage, so the functions check that values are in
correct domain and if not they return null - what is a correct result,
because the values then are physical impossible.
I would recommend that you have the expressions inline, at least if
you desire to cut down execution time.
I wondered why you only wrote that I can't use "select into" for
patitioned tables.
I assumed that it is not possible to create a partitioned tables from
existing ones. But I have not worked much with partitioned tables, so
I could be wrong.
--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pro...ads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinf...ons/books.mspx
Apr 29 '07 #6
Hi again!
As I see the cast is not needed, because the functions return correct
datatype.
Is it that what you mean with "inline"??

I think there is another problem: the original table is not in the
right filegroup
and if I understand it right, It must be or must be to make switch
ultra fast;
I'll investigate...
Greetings
Bjorn

Apr 29 '07 #7
B D Jensen (bj************@gmail.com) writes:
As I see the cast is not needed, because the functions return correct
datatype.
Is it that what you mean with "inline"??
I don't know what your functions do, but it seemed from your description
that I could expect something like:

CREATE FUNCTION makesmaller(@x bigint) RETURNS tinyint AS
BEGIN
RETURN (CASE WHEN @x BETWEEN 0 AND 255 THEN @x ELSE NULL END)
END

Then in your INSERT operation you would rather write:

SELECT CASE WHEN bigcol BETWEEN 0 AND 255 THEN bigcol ELSE NULL END,
...

then

SELECT dbo.makesmaller(bigcol), ...

there is an overhead for the call, although it seems to be a lot less
in SQL 2005 than in SQL 2000.
I think there is another problem: the original table is not in the right
filegroup and if I understand it right, It must be or must be to make
switch ultra fast;
Yes, but as you are about to ditch the original table that is not much
of an issue, or?
--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pro...ads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinf...ons/books.mspx
Apr 29 '07 #8
Hi!
You nearly guessed on of my functions;
but I use <= and >= in stead of 'between'.

I didn't understand the last part about "ditch" (what means that?).
Will the use of functions make the select into very slow?

Greetings
Bjorn

Apr 29 '07 #9
B D Jensen (bj************@gmail.com) writes:
I didn't understand the last part about "ditch" (what means that?).
To ditch = slänga, kasta, göra sig av med.
Will the use of functions make the select into very slow?
Slower. I cannot say how much slower, but I would never use functions for
this situation. Since this appears to be a one-off, code maintainability
does not seem to be important.
--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pro...ads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinf...ons/books.mspx
Apr 29 '07 #10
I assumed that it is not possible to create a partitioned tables from
existing ones. But I have not worked much with partitioned tables, so
I could be wrong.
It is possible to move a non-partitioned table (actually a single partition)
into a partitioned table with ALTER TABLE...SWITCH PARTITION. The
source/target table must have the same schema (including indexes) and
table/indexes must reside on the same , filegroup(s). Also, the source
table must have a check constraint on the partitioning column to ensure data
is within the target partition boundaries.

One caveat is that the index stats are not updated when data is switched
into the partitioned table so it's probably a good idea to update stats
after SWITCH.

--
Hope this helps.

Dan Guzman
SQL Server MVP

"Erland Sommarskog" <es****@sommarskog.sewrote in message
news:Xn**********************@127.0.0.1...
>B D Jensen (bj************@gmail.com) writes:
>Functions are written in T-SQL (i also wrote them in CLR, but in this
case they were slower). The original columns have incorrect datatypes,
that uses too much storage, so the functions check that values are in
correct domain and if not they return null - what is a correct result,
because the values then are physical impossible.

I would recommend that you have the expressions inline, at least if
you desire to cut down execution time.
>I wondered why you only wrote that I can't use "select into" for
patitioned tables.

I assumed that it is not possible to create a partitioned tables from
existing ones. But I have not worked much with partitioned tables, so
I could be wrong.
--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pro...ads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinf...ons/books.mspx
Apr 29 '07 #11
Dan Guzman (gu******@nospam-online.sbcglobal.net) writes:
It is possible to move a non-partitioned table (actually a single
partition) into a partitioned table with ALTER TABLE...SWITCH PARTITION.
The source/target table must have the same schema (including indexes)
and table/indexes must reside on the same , filegroup(s). Also, the
source table must have a check constraint on the partitioning column to
ensure data is within the target partition boundaries.

Ah, that's great. That would it would be possible for Bjørn to create his
partitions with SELECT INTO, add the required index and constraints nd
then glue them together.

Thanks Dan for the information. ... I really need to start playing with
partitioning some day.
--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pro...ads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinf...ons/books.mspx
Apr 29 '07 #12
Hi Dan!
Thanks for the details about requirements.
But I'm afraid I then must create the newtbl first,
because the old table is in the Primary filegroup.

And as I see there is no way for saying:
select id, func1(col1) into newtbl MYNEWFG from oldtbl.

So I think I have look at unload/load now....
Best regards
Bjorn D. Jensen

Apr 30 '07 #13
B D Jensen (bj************@gmail.com) writes:
Thanks for the details about requirements.
But I'm afraid I then must create the newtbl first,
because the old table is in the Primary filegroup.

And as I see there is no way for saying:
select id, func1(col1) into newtbl MYNEWFG from oldtbl.

So I think I have look at unload/load now....
But doesn't ALTER DATABASE permit you to specify a different filegroup as
the default filegroup? You could do that, and then your SELECT INTO tables
should end up there. At least that is what I would expect.
--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pro...ads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinf...ons/books.mspx
Apr 30 '07 #14
On 30 Apr., 09:34, Erland Sommarskog <esq...@sommarskog.sewrote:
B D Jensen (bjorn.d.jen...@gmail.com) writes:
Thanks for the details about requirements.
But I'm afraid I then must create the newtbl first,
because the old table is in the Primary filegroup.
And as I see there is no way for saying:
select id, func1(col1) into newtbl MYNEWFG from oldtbl.
So I think I have look at unload/load now....

But doesn't ALTER DATABASE permit you to specify a different filegroup as
the default filegroup? You could do that, and then your SELECT INTO tables
should end up there. At least that is what I would expect.

--
Erland Sommarskog, SQL Server MVP, esq...@sommarskog.se

Books Online for SQL Server 2005 athttp://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books...
Books Online for SQL Server 2000 athttp://www.microsoft.com/sql/prodinfo/previousversions/books.mspx
very, very good point!
/Bjorn

Apr 30 '07 #15
On 29 Apr., 18:44, Erland Sommarskog <esq...@sommarskog.sewrote:
B D Jensen (bjorn.d.jen...@gmail.com) writes:
I didn't understand the last part about "ditch" (what means that?).

To ditch = slänga, kasta, göra sig av med.
Will the use of functions make the select into very slow?

Slower. I cannot say how much slower, but I would never use functions for
this situation. Since this appears to be a one-off, code maintainability
does not seem to be important.

--
Erland Sommarskog, SQL Server MVP, esq...@sommarskog.se

Books Online for SQL Server 2005 athttp://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books...
Books Online for SQL Server 2000 athttp://www.microsoft.com/sql/prodinfo/previousversions/books.mspx
I made a comparison for the case of converting to tinyint
and wrote a loop going from -1mio to +1mio
using TSQL-function: 72 seconds
using directly between (not in a function): 70 seconds ; it's in
between ;^)
using directly <= and >= : 67seconds

(and CLR-function: 2min 27seconds)

So you are right not writing it in seperate function is faster (in
this case),
so it depends (... ;^) on the situation if the difference is too
costly.

Maybe it's one time only, but if you think you can reuse it, then at
all there is less typing and more
important: your code is much more readable, because it becomes shorter
and much more
natural too read. And if one finds an better implementation, you can
just replace it without affecting depending code.

Again: it depends ;^)
Best regards
Bjørn

Apr 30 '07 #16
B D Jensen (bj************@gmail.com) writes:
I made a comparison for the case of converting to tinyint and wrote a
loop going from -1mio to +1mio using TSQL-function: 72 seconds using
directly between (not in a function): 70 seconds ; it's in between ;^)
using directly <= and >= : 67seconds

So you are right not writing it in seperate function is faster (in
this case),
I find it difficult to believe that there is any case where a scalar
T-SQL UDF would be faster.

Then again, with the numbers you present it's dubious whether you actually
have found a significant difference.
(and CLR-function: 2min 27seconds)
With a more complex operation, you would have had a different outcome.
I once did a test where I had to convert zoned numbers with fixed
decimal from an AS400 system. In that case a CLR function was faster
than all T-SQL solutions. I think I have heard that when you have more
than four operations, the CLR pays off.
Maybe it's one time only, but if you think you can reuse it, then at all
there is less typing and more important: your code is much more
readable, because it becomes shorter and much more natural too read.
Or you sit asking yourself "wonder what this function does".
Again: it depends ;^)
True. That's the answer to almost all performance questions.
--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pro...ads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinf...ons/books.mspx
Apr 30 '07 #17
On Apr 28, 4:55 pm, Erland Sommarskog <esq...@sommarskog.sewrote:
This is not correct. INSERT SELECT FROM is a fully-logged operation. You are
thinking of SELECT INTO which is a minimally logged operation. That is,
all that is logged are the extent allocations. There are no write operations
in SQL Server that are entirely non-logged.
Yes, sorry, I was clearly having a bad day, SELECT INTO is what I
meant.

Regarding the statement that it is a "non logged" operation -
obviously all operations write to the transaction log in some way, I
was just using the term that is in general use that was erroneously
started by Books Online. In any case, as the only records that are
written are merely to log a table's creation the difference is
somewhat irrelevant in this context.

May 1 '07 #18
Mork69 (ml****@bigfoot.com) writes:
Regarding the statement that it is a "non logged" operation -
obviously all operations write to the transaction log in some way, I
was just using the term that is in general use that was erroneously
started by Books Online.
Actually, not even that. Books Online for SQL 2000, is very careful to
talk about minimally logged. I looked in Books Online for SQL 6.5, which
indeed talks about non-logged, but that was loooong ago. And the
architecture was different way backk then.
In any case, as the only records that are written are merely to log a
table's creation.
Not only. The extent allocations are also logged. If they weren't and
the operation failed on illegal convert operation half-way through, you
would be left with a table that would have a couple of rows in it.
--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pro...ads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinf...ons/books.mspx
May 1 '07 #19

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

5 posts views Thread by Mr. x | last post: by
8 posts views Thread by Luther Miller | last post: by
4 posts views Thread by WStoreyII | last post: by
10 posts views Thread by David | last post: by
reply views Thread by Saiars | last post: by
reply views Thread by leo001 | last post: by

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.