473,405 Members | 2,300 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,405 software developers and data experts.

Transform/transfer 50Gb - how to do it fast?

Hello!
I have an big table with 50Gb data, written som functions
that cleanup data and want to do something like this

insert into newtable
select id, func1(col1), func2(col2)
from oldtable;

I'm also plan to make newtable partioned (before insert).

But how could i get the insert as fast as possible?
Greetings
Bjorn D. Jensen

Apr 28 '07 #1
18 3168
Hi,

The quickest way to do this is to use INSERT INTO...SELECT FROM as it
is a non-logged operation

regards,

Malc

B D Jensen wrote:
Hello!
I have an big table with 50Gb data, written som functions
that cleanup data and want to do something like this

insert into newtable
select id, func1(col1), func2(col2)
from oldtable;

I'm also plan to make newtable partioned (before insert).

But how could i get the insert as fast as possible?
Greetings
Bjorn D. Jensen
Apr 28 '07 #2
B D Jensen (bj************@gmail.com) writes:
I have an big table with 50Gb data, written som functions
that cleanup data and want to do something like this

insert into newtable
select id, func1(col1), func2(col2)
from oldtable;

I'm also plan to make newtable partioned (before insert).

But how could i get the insert as fast as possible?
Exactly what is in those functions? Do they perform data access? Are
they written in T-SQL or in the CLR? I ask, because they could have a
great impact on performance.

Apart from that, there are a couple of possible strategies for this
situation. One is SELECT INTO, but since you plan to make the new
table partitioned, I don't think SELECT INTO is good for this. (SELECT
INTO creates a new table.)

Another is to use BCP to first unload the table to a file. You would
then use queryout or a view with your functions, so what get on file
is the cleaned-up version. Then you use BCP to load the data into the
new table. The key here is that there should be no indexes on the table
and it should be empty. In this case the bulk-load is minimally logged.
Of course, you also need to account for the time it takes to create
the indexes.

And the final option is to use a plain INSERT. But a single INSERT
statement will not be good for your transaction log. It's better to
batch and insert, say, 100000 rows at a time, preferrably with the database
set to simple recovery. You should batch on the clustered index of
the old table:

SELECT @start = 1
WHILE EXISTS (SELECT * FROM oldtable WHERE clustercol >= @start)
BEGIN
INSERT newtable (...)
SELECT ...
FROM oldtable
WHERE clustercol >= @start AND clustercol < @start - 100000
SELECT @start = @start + 100000
END

Here the actual increment would depend on the nature of your clustered
key. If it's a date, maybe taking one month at a time is a good idea.

If new the table will have the same clustered index as the old table,
have the clustered index in place when run the above, but wait with
adding non-clustered indexes until you have the data in place.

--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pro...ads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinf...ons/books.mspx
Apr 28 '07 #3
Mork69 (ml****@bigfoot.com) writes:
The quickest way to do this is to use INSERT INTO...SELECT FROM as it
is a non-logged operation
This is not correct. INSERT SELECT FROM is a fully-logged operation. You are
thinking of SELECT INTO which is a minimally logged operation. That is,
all that is logged are the extent allocations. There are no write operations
in SQL Server that are entirely non-logged.
--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pro...ads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinf...ons/books.mspx
Apr 28 '07 #4
>
Exactly what is in those functions? Do they perform data access? Are
they written in T-SQL or in the CLR? I ask, because they could have a
great impact on performance.
Hello Erland!
Functions are written in T-SQL (i also wrote them in CLR, but in this
case they were slower). The original columns have incorrect datatypes,
that uses too much storage,
so the functions check that values are in correct domain and if not
they return null
- what is a correct result, because the values then are physical
impossible.

I wondered why you only wrote that I can't use "select into" for
patitioned tables.
I also expected I must create table first (and then not being able to
use 'select into')
because of the new datatypes - but of course I could write:
select id, cast(func1(col1) as <datatype>) into newtbl from oldtbl

Doing this multiple times with the appropiate where-clause
followed by partion switches maybe will be the solution; I'll
investigate...

.... but if you have some comments let me hear ;-)
Greetings
Bjorn D. Jensen
bj************@gmail.com
P.S. I already know your website, thanks for all that good
information!

Apr 29 '07 #5
B D Jensen (bj************@gmail.com) writes:
Functions are written in T-SQL (i also wrote them in CLR, but in this
case they were slower). The original columns have incorrect datatypes,
that uses too much storage, so the functions check that values are in
correct domain and if not they return null - what is a correct result,
because the values then are physical impossible.
I would recommend that you have the expressions inline, at least if
you desire to cut down execution time.
I wondered why you only wrote that I can't use "select into" for
patitioned tables.
I assumed that it is not possible to create a partitioned tables from
existing ones. But I have not worked much with partitioned tables, so
I could be wrong.
--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pro...ads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinf...ons/books.mspx
Apr 29 '07 #6
Hi again!
As I see the cast is not needed, because the functions return correct
datatype.
Is it that what you mean with "inline"??

I think there is another problem: the original table is not in the
right filegroup
and if I understand it right, It must be or must be to make switch
ultra fast;
I'll investigate...
Greetings
Bjorn

Apr 29 '07 #7
B D Jensen (bj************@gmail.com) writes:
As I see the cast is not needed, because the functions return correct
datatype.
Is it that what you mean with "inline"??
I don't know what your functions do, but it seemed from your description
that I could expect something like:

CREATE FUNCTION makesmaller(@x bigint) RETURNS tinyint AS
BEGIN
RETURN (CASE WHEN @x BETWEEN 0 AND 255 THEN @x ELSE NULL END)
END

Then in your INSERT operation you would rather write:

SELECT CASE WHEN bigcol BETWEEN 0 AND 255 THEN bigcol ELSE NULL END,
...

then

SELECT dbo.makesmaller(bigcol), ...

there is an overhead for the call, although it seems to be a lot less
in SQL 2005 than in SQL 2000.
I think there is another problem: the original table is not in the right
filegroup and if I understand it right, It must be or must be to make
switch ultra fast;
Yes, but as you are about to ditch the original table that is not much
of an issue, or?
--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pro...ads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinf...ons/books.mspx
Apr 29 '07 #8
Hi!
You nearly guessed on of my functions;
but I use <= and >= in stead of 'between'.

I didn't understand the last part about "ditch" (what means that?).
Will the use of functions make the select into very slow?

Greetings
Bjorn

Apr 29 '07 #9
B D Jensen (bj************@gmail.com) writes:
I didn't understand the last part about "ditch" (what means that?).
To ditch = slänga, kasta, göra sig av med.
Will the use of functions make the select into very slow?
Slower. I cannot say how much slower, but I would never use functions for
this situation. Since this appears to be a one-off, code maintainability
does not seem to be important.
--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pro...ads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinf...ons/books.mspx
Apr 29 '07 #10
I assumed that it is not possible to create a partitioned tables from
existing ones. But I have not worked much with partitioned tables, so
I could be wrong.
It is possible to move a non-partitioned table (actually a single partition)
into a partitioned table with ALTER TABLE...SWITCH PARTITION. The
source/target table must have the same schema (including indexes) and
table/indexes must reside on the same , filegroup(s). Also, the source
table must have a check constraint on the partitioning column to ensure data
is within the target partition boundaries.

One caveat is that the index stats are not updated when data is switched
into the partitioned table so it's probably a good idea to update stats
after SWITCH.

--
Hope this helps.

Dan Guzman
SQL Server MVP

"Erland Sommarskog" <es****@sommarskog.sewrote in message
news:Xn**********************@127.0.0.1...
>B D Jensen (bj************@gmail.com) writes:
>Functions are written in T-SQL (i also wrote them in CLR, but in this
case they were slower). The original columns have incorrect datatypes,
that uses too much storage, so the functions check that values are in
correct domain and if not they return null - what is a correct result,
because the values then are physical impossible.

I would recommend that you have the expressions inline, at least if
you desire to cut down execution time.
>I wondered why you only wrote that I can't use "select into" for
patitioned tables.

I assumed that it is not possible to create a partitioned tables from
existing ones. But I have not worked much with partitioned tables, so
I could be wrong.
--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pro...ads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinf...ons/books.mspx
Apr 29 '07 #11
Dan Guzman (gu******@nospam-online.sbcglobal.net) writes:
It is possible to move a non-partitioned table (actually a single
partition) into a partitioned table with ALTER TABLE...SWITCH PARTITION.
The source/target table must have the same schema (including indexes)
and table/indexes must reside on the same , filegroup(s). Also, the
source table must have a check constraint on the partitioning column to
ensure data is within the target partition boundaries.

Ah, that's great. That would it would be possible for Bjørn to create his
partitions with SELECT INTO, add the required index and constraints nd
then glue them together.

Thanks Dan for the information. ... I really need to start playing with
partitioning some day.
--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pro...ads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinf...ons/books.mspx
Apr 29 '07 #12
Hi Dan!
Thanks for the details about requirements.
But I'm afraid I then must create the newtbl first,
because the old table is in the Primary filegroup.

And as I see there is no way for saying:
select id, func1(col1) into newtbl MYNEWFG from oldtbl.

So I think I have look at unload/load now....
Best regards
Bjorn D. Jensen

Apr 30 '07 #13
B D Jensen (bj************@gmail.com) writes:
Thanks for the details about requirements.
But I'm afraid I then must create the newtbl first,
because the old table is in the Primary filegroup.

And as I see there is no way for saying:
select id, func1(col1) into newtbl MYNEWFG from oldtbl.

So I think I have look at unload/load now....
But doesn't ALTER DATABASE permit you to specify a different filegroup as
the default filegroup? You could do that, and then your SELECT INTO tables
should end up there. At least that is what I would expect.
--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pro...ads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinf...ons/books.mspx
Apr 30 '07 #14
On 30 Apr., 09:34, Erland Sommarskog <esq...@sommarskog.sewrote:
B D Jensen (bjorn.d.jen...@gmail.com) writes:
Thanks for the details about requirements.
But I'm afraid I then must create the newtbl first,
because the old table is in the Primary filegroup.
And as I see there is no way for saying:
select id, func1(col1) into newtbl MYNEWFG from oldtbl.
So I think I have look at unload/load now....

But doesn't ALTER DATABASE permit you to specify a different filegroup as
the default filegroup? You could do that, and then your SELECT INTO tables
should end up there. At least that is what I would expect.

--
Erland Sommarskog, SQL Server MVP, esq...@sommarskog.se

Books Online for SQL Server 2005 athttp://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books...
Books Online for SQL Server 2000 athttp://www.microsoft.com/sql/prodinfo/previousversions/books.mspx
very, very good point!
/Bjorn

Apr 30 '07 #15
On 29 Apr., 18:44, Erland Sommarskog <esq...@sommarskog.sewrote:
B D Jensen (bjorn.d.jen...@gmail.com) writes:
I didn't understand the last part about "ditch" (what means that?).

To ditch = slänga, kasta, göra sig av med.
Will the use of functions make the select into very slow?

Slower. I cannot say how much slower, but I would never use functions for
this situation. Since this appears to be a one-off, code maintainability
does not seem to be important.

--
Erland Sommarskog, SQL Server MVP, esq...@sommarskog.se

Books Online for SQL Server 2005 athttp://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books...
Books Online for SQL Server 2000 athttp://www.microsoft.com/sql/prodinfo/previousversions/books.mspx
I made a comparison for the case of converting to tinyint
and wrote a loop going from -1mio to +1mio
using TSQL-function: 72 seconds
using directly between (not in a function): 70 seconds ; it's in
between ;^)
using directly <= and >= : 67seconds

(and CLR-function: 2min 27seconds)

So you are right not writing it in seperate function is faster (in
this case),
so it depends (... ;^) on the situation if the difference is too
costly.

Maybe it's one time only, but if you think you can reuse it, then at
all there is less typing and more
important: your code is much more readable, because it becomes shorter
and much more
natural too read. And if one finds an better implementation, you can
just replace it without affecting depending code.

Again: it depends ;^)
Best regards
Bjørn

Apr 30 '07 #16
B D Jensen (bj************@gmail.com) writes:
I made a comparison for the case of converting to tinyint and wrote a
loop going from -1mio to +1mio using TSQL-function: 72 seconds using
directly between (not in a function): 70 seconds ; it's in between ;^)
using directly <= and >= : 67seconds

So you are right not writing it in seperate function is faster (in
this case),
I find it difficult to believe that there is any case where a scalar
T-SQL UDF would be faster.

Then again, with the numbers you present it's dubious whether you actually
have found a significant difference.
(and CLR-function: 2min 27seconds)
With a more complex operation, you would have had a different outcome.
I once did a test where I had to convert zoned numbers with fixed
decimal from an AS400 system. In that case a CLR function was faster
than all T-SQL solutions. I think I have heard that when you have more
than four operations, the CLR pays off.
Maybe it's one time only, but if you think you can reuse it, then at all
there is less typing and more important: your code is much more
readable, because it becomes shorter and much more natural too read.
Or you sit asking yourself "wonder what this function does".
Again: it depends ;^)
True. That's the answer to almost all performance questions.
--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pro...ads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinf...ons/books.mspx
Apr 30 '07 #17
On Apr 28, 4:55 pm, Erland Sommarskog <esq...@sommarskog.sewrote:
This is not correct. INSERT SELECT FROM is a fully-logged operation. You are
thinking of SELECT INTO which is a minimally logged operation. That is,
all that is logged are the extent allocations. There are no write operations
in SQL Server that are entirely non-logged.
Yes, sorry, I was clearly having a bad day, SELECT INTO is what I
meant.

Regarding the statement that it is a "non logged" operation -
obviously all operations write to the transaction log in some way, I
was just using the term that is in general use that was erroneously
started by Books Online. In any case, as the only records that are
written are merely to log a table's creation the difference is
somewhat irrelevant in this context.

May 1 '07 #18
Mork69 (ml****@bigfoot.com) writes:
Regarding the statement that it is a "non logged" operation -
obviously all operations write to the transaction log in some way, I
was just using the term that is in general use that was erroneously
started by Books Online.
Actually, not even that. Books Online for SQL 2000, is very careful to
talk about minimally logged. I looked in Books Online for SQL 6.5, which
indeed talks about non-logged, but that was loooong ago. And the
architecture was different way backk then.
In any case, as the only records that are written are merely to log a
table's creation.
Not only. The extent allocations are also logged. If they weren't and
the operation failed on illegal convert operation half-way through, you
would be left with a table that would have a couple of rows in it.
--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pro...ads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinf...ons/books.mspx
May 1 '07 #19

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: Weston C | last post by:
I'm new to the GD/Image functions with PHP, and I'm trying to use them to manipulate jpeg images that I've stored in a MySQL database (specifically, creating thumbnails). The thing I can't tell...
5
by: Mr. x | last post by:
Hello, I am using .net. I need to transfer table data of an html (client) to the server. I don't know if best thing to do is (and correct me if I'm wrong) : 1) translate the table to xml code...
8
by: Luther Miller | last post by:
I am using the XML tranform functionality in .NET to transform data in a DataSet into XMLSS using an XSLT file I have created. There are about 100 columns and only about 120 rows in the data...
4
by: Charles | last post by:
I am doing a Digital signal processing project using VB.net and need advance maths function and also signal processing function such as fast fourier transform. Can anyone advice me on how do I get...
4
by: WStoreyII | last post by:
I wish to know how to set it up so that when an xml webservice is called that instead of displaying the xml in the browser it will render it with a xslt file the problem is i dont know how to do...
11
by: E.T. Grey | last post by:
Hi, I have an interesting problem. I have a (LARGE) set of historical data that I want to keep on a central server, as several separate files. I want a client process to be able to request the...
10
by: David | last post by:
I have googled to no avail on getting specifically what I'm looking for. I have found plenty of full blown apps that implement some type of file transfer but what I'm specifcally looking for is an...
2
by: IdlePhaedrus | last post by:
Hi, I have a FFT routine that I converted from C++ to VB in a module as follows: Const M_PI = 3.1415926535897931 ' Fast Fourier Transform Public Sub FFT(ByRef rex() As Single, ByRef imx() As...
5
by: DR | last post by:
Why is its substantialy slower to load 50GB of gzipped file (20GB gzipped file) then loading 50GB unzipped data? im using System.IO.Compression.GZipStream and its not maxing out the cpu while...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.