Transform/transfer 50Gb - how to do it fast?

B D Jensen

Hello!
I have an big table with 50Gb data, written som functions
that cleanup data and want to do something like this

insert into newtable
select id, func1(col1), func2(col2)
from oldtable;

I'm also plan to make newtable partioned (before insert).

But how could i get the insert as fast as possible?
Greetings
Bjorn D. Jensen

Apr 28 '07 #1

Subscribe Post Reply

3168

Mork69

Hi,

The quickest way to do this is to use INSERT INTO...SELECT FROM as it
is a non-logged operation

regards,

Malc

B D Jensen wrote:

Hello!
I have an big table with 50Gb data, written som functions
that cleanup data and want to do something like this

insert into newtable
select id, func1(col1), func2(col2)
from oldtable;

I'm also plan to make newtable partioned (before insert).

But how could i get the insert as fast as possible?
Greetings
Bjorn D. Jensen

Apr 28 '07 #2

Erland Sommarskog

B D Jensen (bj************@gmail.com) writes:

I have an big table with 50Gb data, written som functions
that cleanup data and want to do something like this

insert into newtable
select id, func1(col1), func2(col2)
from oldtable;

I'm also plan to make newtable partioned (before insert).

But how could i get the insert as fast as possible?

Exactly what is in those functions? Do they perform data access? Are
they written in T-SQL or in the CLR? I ask, because they could have a
great impact on performance.

Apart from that, there are a couple of possible strategies for this
situation. One is SELECT INTO, but since you plan to make the new
table partitioned, I don't think SELECT INTO is good for this. (SELECT
INTO creates a new table.)

Another is to use BCP to first unload the table to a file. You would
then use queryout or a view with your functions, so what get on file
is the cleaned-up version. Then you use BCP to load the data into the
new table. The key here is that there should be no indexes on the table
and it should be empty. In this case the bulk-load is minimally logged.
Of course, you also need to account for the time it takes to create
the indexes.

And the final option is to use a plain INSERT. But a single INSERT
statement will not be good for your transaction log. It's better to
batch and insert, say, 100000 rows at a time, preferrably with the database
set to simple recovery. You should batch on the clustered index of
the old table:

SELECT @start = 1
WHILE EXISTS (SELECT * FROM oldtable WHERE clustercol >= @start)
BEGIN
INSERT newtable (...)
SELECT ...
FROM oldtable
WHERE clustercol >= @start AND clustercol < @start - 100000
SELECT @start = @start + 100000
END

Here the actual increment would depend on the nature of your clustered
key. If it's a date, maybe taking one month at a time is a good idea.

If new the table will have the same clustered index as the old table,
have the clustered index in place when run the above, but wait with
adding non-clustered indexes until you have the data in place.

--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pro...ads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinf...ons/books.mspx

Apr 28 '07 #3

Erland Sommarskog

Mork69 (ml****@bigfoot.com) writes:

The quickest way to do this is to use INSERT INTO...SELECT FROM as it
is a non-logged operation

This is not correct. INSERT SELECT FROM is a fully-logged operation. You are
thinking of SELECT INTO which is a minimally logged operation. That is,
all that is logged are the extent allocations. There are no write operations
in SQL Server that are entirely non-logged.
--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pro...ads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinf...ons/books.mspx

Apr 28 '07 #4

B D Jensen

Exactly what is in those functions? Do they perform data access? Are
they written in T-SQL or in the CLR? I ask, because they could have a
great impact on performance.

Hello Erland!
Functions are written in T-SQL (i also wrote them in CLR, but in this
case they were slower). The original columns have incorrect datatypes,
that uses too much storage,
so the functions check that values are in correct domain and if not
they return null
- what is a correct result, because the values then are physical
impossible.

I wondered why you only wrote that I can't use "select into" for
patitioned tables.
I also expected I must create table first (and then not being able to
use 'select into')
because of the new datatypes - but of course I could write:
select id, cast(func1(col1) as <datatype>) into newtbl from oldtbl

Doing this multiple times with the appropiate where-clause
followed by partion switches maybe will be the solution; I'll
investigate...

.... but if you have some comments let me hear ;-)
Greetings
Bjorn D. Jensen
bj************@gmail.com
P.S. I already know your website, thanks for all that good
information!

Apr 29 '07 #5

Erland Sommarskog

B D Jensen (bj************@gmail.com) writes:

Functions are written in T-SQL (i also wrote them in CLR, but in this
case they were slower). The original columns have incorrect datatypes,
that uses too much storage, so the functions check that values are in
correct domain and if not they return null - what is a correct result,
because the values then are physical impossible.

I would recommend that you have the expressions inline, at least if
you desire to cut down execution time.

I wondered why you only wrote that I can't use "select into" for
patitioned tables.

I assumed that it is not possible to create a partitioned tables from
existing ones. But I have not worked much with partitioned tables, so
I could be wrong.
--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pro...ads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinf...ons/books.mspx

Apr 29 '07 #6

B D Jensen

Hi again!
As I see the cast is not needed, because the functions return correct
datatype.
Is it that what you mean with "inline"??

I think there is another problem: the original table is not in the
right filegroup
and if I understand it right, It must be or must be to make switch
ultra fast;
I'll investigate...
Greetings
Bjorn

Apr 29 '07 #7

Erland Sommarskog

B D Jensen (bj************@gmail.com) writes:

As I see the cast is not needed, because the functions return correct
datatype.
Is it that what you mean with "inline"??

I don't know what your functions do, but it seemed from your description
that I could expect something like:

CREATE FUNCTION makesmaller(@x bigint) RETURNS tinyint AS
BEGIN
RETURN (CASE WHEN @x BETWEEN 0 AND 255 THEN @x ELSE NULL END)
END

Then in your INSERT operation you would rather write:

SELECT CASE WHEN bigcol BETWEEN 0 AND 255 THEN bigcol ELSE NULL END,
...

then

SELECT dbo.makesmaller(bigcol), ...

there is an overhead for the call, although it seems to be a lot less
in SQL 2005 than in SQL 2000.

I think there is another problem: the original table is not in the right
filegroup and if I understand it right, It must be or must be to make
switch ultra fast;

Yes, but as you are about to ditch the original table that is not much
of an issue, or?
--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pro...ads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinf...ons/books.mspx

Apr 29 '07 #8

B D Jensen

Hi!
You nearly guessed on of my functions;
but I use <= and >= in stead of 'between'.

I didn't understand the last part about "ditch" (what means that?).
Will the use of functions make the select into very slow?

Greetings
Bjorn

Apr 29 '07 #9

Erland Sommarskog

B D Jensen (bj************@gmail.com) writes:

I didn't understand the last part about "ditch" (what means that?).

To ditch = slänga, kasta, göra sig av med.

Will the use of functions make the select into very slow?

Slower. I cannot say how much slower, but I would never use functions for
this situation. Since this appears to be a one-off, code maintainability
does not seem to be important.
--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pro...ads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinf...ons/books.mspx

Apr 29 '07 #10

Dan Guzman

I assumed that it is not possible to create a partitioned tables from

existing ones. But I have not worked much with partitioned tables, so
I could be wrong.

It is possible to move a non-partitioned table (actually a single partition)
into a partitioned table with ALTER TABLE...SWITCH PARTITION. The
source/target table must have the same schema (including indexes) and
table/indexes must reside on the same , filegroup(s). Also, the source
table must have a check constraint on the partitioning column to ensure data
is within the target partition boundaries.

One caveat is that the index stats are not updated when data is switched
into the partitioned table so it's probably a good idea to update stats
after SWITCH.

--
Hope this helps.

Dan Guzman
SQL Server MVP

"Erland Sommarskog" <es****@sommarskog.sewrote in message
news:Xn**********************@127.0.0.1...

>B D Jensen (bj************@gmail.com) writes:
>Functions are written in T-SQL (i also wrote them in CLR, but in this
case they were slower). The original columns have incorrect datatypes,
that uses too much storage, so the functions check that values are in
correct domain and if not they return null - what is a correct result,
because the values then are physical impossible.

I would recommend that you have the expressions inline, at least if
you desire to cut down execution time.

>I wondered why you only wrote that I can't use "select into" for
patitioned tables.

I assumed that it is not possible to create a partitioned tables from
existing ones. But I have not worked much with partitioned tables, so
I could be wrong.
--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pro...ads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinf...ons/books.mspx

Apr 29 '07 #11

Erland Sommarskog

Dan Guzman (gu******@nospam-online.sbcglobal.net) writes:

It is possible to move a non-partitioned table (actually a single
partition) into a partitioned table with ALTER TABLE...SWITCH PARTITION.
The source/target table must have the same schema (including indexes)
and table/indexes must reside on the same , filegroup(s). Also, the
source table must have a check constraint on the partitioning column to
ensure data is within the target partition boundaries.

Ah, that's great. That would it would be possible for Bjørn to create his
partitions with SELECT INTO, add the required index and constraints nd
then glue them together.

Thanks Dan for the information. ... I really need to start playing with
partitioning some day.
--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pro...ads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinf...ons/books.mspx

Apr 29 '07 #12

B D Jensen

Hi Dan!
Thanks for the details about requirements.
But I'm afraid I then must create the newtbl first,
because the old table is in the Primary filegroup.

And as I see there is no way for saying:
select id, func1(col1) into newtbl MYNEWFG from oldtbl.

So I think I have look at unload/load now....
Best regards
Bjorn D. Jensen

Apr 30 '07 #13

Erland Sommarskog

B D Jensen (bj************@gmail.com) writes:

Thanks for the details about requirements.
But I'm afraid I then must create the newtbl first,
because the old table is in the Primary filegroup.

And as I see there is no way for saying:
select id, func1(col1) into newtbl MYNEWFG from oldtbl.

So I think I have look at unload/load now....

But doesn't ALTER DATABASE permit you to specify a different filegroup as
the default filegroup? You could do that, and then your SELECT INTO tables
should end up there. At least that is what I would expect.
--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pro...ads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinf...ons/books.mspx

Apr 30 '07 #14

B D Jensen

On 30 Apr., 09:34, Erland Sommarskog <esq...@sommarskog.sewrote:

B D Jensen (bjorn.d.jen...@gmail.com) writes:

Thanks for the details about requirements.
But I'm afraid I then must create the newtbl first,
because the old table is in the Primary filegroup.

And as I see there is no way for saying:
select id, func1(col1) into newtbl MYNEWFG from oldtbl.

So I think I have look at unload/load now....

But doesn't ALTER DATABASE permit you to specify a different filegroup as
the default filegroup? You could do that, and then your SELECT INTO tables
should end up there. At least that is what I would expect.

--
Erland Sommarskog, SQL Server MVP, esq...@sommarskog.se

Books Online for SQL Server 2005 athttp://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books...
Books Online for SQL Server 2000 athttp://www.microsoft.com/sql/prodinfo/previousversions/books.mspx

very, very good point!
/Bjorn

Apr 30 '07 #15

B D Jensen

On 29 Apr., 18:44, Erland Sommarskog <esq...@sommarskog.sewrote:

B D Jensen (bjorn.d.jen...@gmail.com) writes:

I didn't understand the last part about "ditch" (what means that?).

To ditch = slänga, kasta, göra sig av med.

Will the use of functions make the select into very slow?

Slower. I cannot say how much slower, but I would never use functions for
this situation. Since this appears to be a one-off, code maintainability
does not seem to be important.

--
Erland Sommarskog, SQL Server MVP, esq...@sommarskog.se

Books Online for SQL Server 2005 athttp://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books...
Books Online for SQL Server 2000 athttp://www.microsoft.com/sql/prodinfo/previousversions/books.mspx

I made a comparison for the case of converting to tinyint
and wrote a loop going from -1mio to +1mio
using TSQL-function: 72 seconds
using directly between (not in a function): 70 seconds ; it's in
between ;^)
using directly <= and >= : 67seconds

(and CLR-function: 2min 27seconds)

So you are right not writing it in seperate function is faster (in
this case),
so it depends (... ;^) on the situation if the difference is too
costly.

Maybe it's one time only, but if you think you can reuse it, then at
all there is less typing and more
important: your code is much more readable, because it becomes shorter
and much more
natural too read. And if one finds an better implementation, you can
just replace it without affecting depending code.

Again: it depends ;^)
Best regards
Bjørn

Apr 30 '07 #16

Erland Sommarskog

B D Jensen (bj************@gmail.com) writes:

I made a comparison for the case of converting to tinyint and wrote a
loop going from -1mio to +1mio using TSQL-function: 72 seconds using
directly between (not in a function): 70 seconds ; it's in between ;^)
using directly <= and >= : 67seconds

So you are right not writing it in seperate function is faster (in
this case),

I find it difficult to believe that there is any case where a scalar
T-SQL UDF would be faster.

Then again, with the numbers you present it's dubious whether you actually
have found a significant difference.

(and CLR-function: 2min 27seconds)

With a more complex operation, you would have had a different outcome.
I once did a test where I had to convert zoned numbers with fixed
decimal from an AS400 system. In that case a CLR function was faster
than all T-SQL solutions. I think I have heard that when you have more
than four operations, the CLR pays off.

Maybe it's one time only, but if you think you can reuse it, then at all
there is less typing and more important: your code is much more
readable, because it becomes shorter and much more natural too read.

Or you sit asking yourself "wonder what this function does".

Again: it depends ;^)

True. That's the answer to almost all performance questions.
--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pro...ads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinf...ons/books.mspx

Apr 30 '07 #17

Mork69

On Apr 28, 4:55 pm, Erland Sommarskog <esq...@sommarskog.sewrote:

This is not correct. INSERT SELECT FROM is a fully-logged operation. You are
thinking of SELECT INTO which is a minimally logged operation. That is,
all that is logged are the extent allocations. There are no write operations
in SQL Server that are entirely non-logged.

Yes, sorry, I was clearly having a bad day, SELECT INTO is what I
meant.

Regarding the statement that it is a "non logged" operation -
obviously all operations write to the transaction log in some way, I
was just using the term that is in general use that was erroneously
started by Books Online. In any case, as the only records that are
written are merely to log a table's creation the difference is
somewhat irrelevant in this context.

May 1 '07 #18

Erland Sommarskog

Mork69 (ml****@bigfoot.com) writes:

Regarding the statement that it is a "non logged" operation -
obviously all operations write to the transaction log in some way, I
was just using the term that is in general use that was erroneously
started by Books Online.

Actually, not even that. Books Online for SQL 2000, is very careful to
talk about minimally logged. I looked in Books Online for SQL 6.5, which
indeed talks about non-logged, but that was loooong ago. And the
architecture was different way backk then.

In any case, as the only records that are written are merely to log a
table's creation.

Not only. The extent allocations are also logged. If they weren't and
the operation failed on illegal convert operation half-way through, you
would be left with a table that would have a couple of rows in it.
--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pro...ads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinf...ons/books.mspx

May 1 '07 #19

by: Weston C | last post by:

I'm new to the GD/Image functions with PHP, and I'm trying to use them to manipulate jpeg images that I've stored in a MySQL database (specifically, creating thumbnails). The thing I can't tell...

PHP

transform xml from client to server

by: Mr. x | last post by:

Hello, I am using .net. I need to transfer table data of an html (client) to the server. I don't know if best thing to do is (and correct me if I'm wrong) : 1) translate the table to xml code...

.NET Framework

XML Transform - slower than molasses in January?

by: Luther Miller | last post by:

I am using the XML tranform functionality in .NET to transform data in a DataSet into XMLSS using an XSLT file I have created. There are about 100 columns and only about 120 rows in the data...

.NET Framework

help Using vb.net to perform fast fourier transform

by: Charles | last post by:

I am doing a Digital signal processing project using VB.net and need advance maths function and also signal processing function such as fast fourier transform. Can anyone advice me on how do I get...

Visual Basic .NET

Transform a Webservice With Xslt

by: WStoreyII | last post by:

I wish to know how to set it up so that when an xml webservice is called that instead of displaying the xml in the browser it will render it with a xslt file the problem is i dont know how to do...

.NET Framework

Data transfer problem - ideas/solutions wanted (please)

by: E.T. Grey | last post by:

Hi, I have an interesting problem. I have a (LARGE) set of historical data that I want to keep on a central server, as several separate files. I want a client process to be able to request the...

PHP

file transfer with sockets

by: David | last post by:

I have googled to no avail on getting specifically what I'm looking for. I have found plenty of full blown apps that implement some type of file transfer but what I'm specifcally looking for is an...

C# / C Sharp

Fast Fourier Transform (FFT) in VB .Net - Please Help

by: IdlePhaedrus | last post by:

Hi, I have a FFT routine that I converted from C++ to VB in a module as follows: Const M_PI = 3.1415926535897931 ' Fast Fourier Transform Public Sub FFT(ByRef rex() As Single, ByRef imx() As...

Visual Basic .NET

Why is its substantialy slower to load 50GB of gzipped file (20GB gzipped file) then loading 50GB unzipped data? im using System.IO.Compression.GZipStream and its not maxing out the cpu while loading the gzip data! Im using the default buffer of the

by: DR | last post by:

Why is its substantialy slower to load 50GB of gzipped file (20GB gzipped file) then loading 50GB unzipped data? im using System.IO.Compression.GZipStream and its not maxing out the cpu while...

C# / C Sharp

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General

Transform/transfer 50Gb - how to do it fast?

Similar topics