473,373 Members | 1,553 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,373 software developers and data experts.

DWH problem: updating a table with every 1000 records a checkpoint

Hi,

Currently we're a building a metadatadriven datawarehouse in SQL
Server 2000. We're investigating the possibility of the updating
tables with enormeous number of updates and insert and the use of
checkpoints (for simple recovery and Backup Log for full recovery).

On several website people speak about full transaction log and the
pace of growing can't keep up with the update. Therefore we want to
create a script which flushes the dirty pages to the disk. It's not
quite clear to me how it works. Questions we have is:
* How does the process of updating, insert and deleting works with SQL
Server 2000 with respect to log cache, log file, buffer cache, commit,
checkpoint, etc?
What happens when?
* As far as i can see now: i'm thinking of creating chunks of data of
1000 records with a checkpoint after the Query. SQL server has the
default of implicit transactions and so it will not need a commit.
Something like this?
* How do i create chunks of 1000 records automatically without
creating a identity field or something. Is there something like SELECT
NEXT 1000?

Greetz,

Hennie
Jul 20 '05 #1
6 5745
> * How does the process of updating, insert and deleting works with SQL
Server 2000 with respect to log cache, log file, buffer cache, commit,
checkpoint, etc?
What happens when?
There's a discussion of this in the Books Online
<architec.chm::/8_ar_sa_8unn.htm>. In summary, all data modifications are
written to both the buffer cache and transaction log buffer cache. The log
buffer is written first. Modified buffer cache pages are written to disk by
either worker threads, the lazy writer or the checkpoint process. Worker
threads write data when they need to wait for a read i/o anyway. The lazy
writer process wakes up periodically and writes data in order to maintain a
reasonable number of free cache buffers. The checkpoint process writes all
dirty pages to disk periodically (about a minute by default) in order to
reduce startup recovery time.

Log buffers are written as they become full, when you issue a COMMIT and
during a checkpoint.
* As far as i can see now: i'm thinking of creating chunks of data of
1000 records with a checkpoint after the Query. SQL server has the
default of implicit transactions and so it will not need a commit.
Something like this?
* How do i create chunks of 1000 records automatically without
creating a identity field or something. Is there something like SELECT
NEXT 1000?
I'm not sure I understand what you are asking here. In the default
autocommit mode, each statement is a separate transaction.

It seems you are mostly concerned with transaction log space management.
The main consideration is to keep your transactions reasonably sized since
only committed data can be removed from the log during a log backup (FULL or
BULK_LOGGED recovery model) or checkpoint (SIMPLE model).

--
Hope this helps.

Dan Guzman
SQL Server MVP

"Hennie de Nooijer" <hd********@hotmail.com> wrote in message
news:19**************************@posting.google.c om... Hi,

Currently we're a building a metadatadriven datawarehouse in SQL
Server 2000. We're investigating the possibility of the updating
tables with enormeous number of updates and insert and the use of
checkpoints (for simple recovery and Backup Log for full recovery).

On several website people speak about full transaction log and the
pace of growing can't keep up with the update. Therefore we want to
create a script which flushes the dirty pages to the disk. It's not
quite clear to me how it works. Questions we have is:
* How does the process of updating, insert and deleting works with SQL
Server 2000 with respect to log cache, log file, buffer cache, commit,
checkpoint, etc?
What happens when?
* As far as i can see now: i'm thinking of creating chunks of data of
1000 records with a checkpoint after the Query. SQL server has the
default of implicit transactions and so it will not need a commit.
Something like this?
* How do i create chunks of 1000 records automatically without
creating a identity field or something. Is there something like SELECT
NEXT 1000?

Greetz,

Hennie

Jul 20 '05 #2
Hennie de Nooijer (hd********@hotmail.com) writes:
* How does the process of updating, insert and deleting works with SQL
Server 2000 with respect to log cache, log file, buffer cache, commit,
checkpoint, etc?
What happens when?
* As far as i can see now: i'm thinking of creating chunks of data of
1000 records with a checkpoint after the Query. SQL server has the
default of implicit transactions and so it will not need a commit.
Something like this?
* How do i create chunks of 1000 records automatically without
creating a identity field or something. Is there something like SELECT
NEXT 1000?


I hope Dan's reply shed some light on how the log works.

As for breaking up in chucks there is nothing built-in like:

UPDATE tbl
SET yadayada
...
WITH COMMIT EACH 1000 ROW

For DML statements you will have to find out how to do the batching best
from your data. Note there is no reason to keep a strict batch size, but
if you have some column that divides the data nicely that can be used.
For instance, if you are handling order data, you can take one day or
month at a time.

The one case where you can tell SQL Server to commit by each n row is
when you bulk load with BCP or BULK INSERT.

I also like to point out that selecting the batch size, is a trade-off
between log size and speed. Particularly if the search condition to
identify a batch takes time to execute, too small batch sizes can be
costly. A little depending on row size, but 100000 rows at time is not
an unreasonable number. 1000 is by far to small.

--
Erland Sommarskog, SQL Server MVP, so****@algonet.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinf...2000/books.asp
Jul 20 '05 #3
> > * How does the process of updating, insert and deleting works with SQL
Server 2000 with respect to log cache, log file, buffer cache, commit,
checkpoint, etc?
What happens when?
There's a discussion of this in the Books Online
<architec.chm::/8_ar_sa_8unn.htm>.

How does this works?
In summary, all data modifications are
written to both the buffer cache and transaction log buffer cache. The log
buffer is written first. Modified buffer cache pages are written to disk by
either worker threads, the lazy writer or the checkpoint process. Worker
threads write data when they need to wait for a read i/o anyway. The lazy
writer process wakes up periodically and writes data in order to maintain a
reasonable number of free cache buffers. The checkpoint process writes all
dirty pages to disk periodically (about a minute by default) in order to
reduce startup recovery time. Ok clear..

Log buffers are written as they become full, when you issue a COMMIT and
during a checkpoint. It thought that a commit is not neccessary a write to the disk. A
commit is just a end mark of a transaction as far i can see in the
documentation. A checkpoint will write the transactions to the
database.
* As far as i can see now: i'm thinking of creating chunks of data of
1000 records with a checkpoint after the Query. SQL server has the
default of implicit transactions and so it will not need a commit.
Something like this?
* How do i create chunks of 1000 records automatically without
creating a identity field or something. Is there something like SELECT
NEXT 1000?


I'm not sure I understand what you are asking here. In the default
autocommit mode, each statement is a separate transaction.

It seems you are mostly concerned with transaction log space management.
The main consideration is to keep your transactions reasonably sized since
only committed data can be removed from the log during a log backup (FULL or
BULK_LOGGED recovery model) or checkpoint (SIMPLE model).

Yup this is what i meant. Only the question is what is a reasonable
chunk? Perhaps i will create parameter for this so i can tune this.
Jul 20 '05 #4
> I hope Dan's reply shed some light on how the log works.
Yup
As for breaking up in chucks there is nothing built-in like:

UPDATE tbl
SET yadayada
...
WITH COMMIT EACH 1000 ROW

For DML statements you will have to find out how to do the batching best
from your data. Note there is no reason to keep a strict batch size, but
if you have some column that divides the data nicely that can be used.
For instance, if you are handling order data, you can take one day or
month at a time. Yeah well i have identity column which i could use for chunking the
records. Disadvantage is that it is an implementation depending on the
functionality of a column in the table. It's not what i want but i'll
have to deal with it.

The one case where you can tell SQL Server to commit by each n row is
when you bulk load with BCP or BULK INSERT. Well this is not possible. I'm reading and insert/updating from other
tables in SQL server and BCP reads only from files.

I also like to point out that selecting the batch size, is a trade-off
between log size and speed. Particularly if the search condition to
identify a batch takes time to execute, too small batch sizes can be
costly. A little depending on row size, but 100000 rows at time is not
an unreasonable number. 1000 is by far to small.

Well 1000 was just a hunch (is this rightly written). I thinking of
creating a parameter from which we can tune the chunks..

Thx you all for your answers

Greetz
Hennie
Jul 20 '05 #5
> > There's a discussion of this in the Books Online
<architec.chm::/8_ar_sa_8unn.htm>. How does this works?


From the Books Online, you can go directly to the referenced topic by
clicking Go-->URL and pasting in the URL 'architec.chm::/8_ar_sa_8unn.htm'.
Log buffers are written as they become full, when you issue a COMMIT and
during a checkpoint.

It thought that a commit is not neccessary a write to the disk. A
commit is just a end mark of a transaction as far i can see in the
documentation. A checkpoint will write the transactions to the
database.


A COMMIT ensures all committed data are written to the log file so that
committed data are permanently persisted to the database. Modified data may
or may not have been written to data files because dirty data pages are
written separately by worker threads, the lazy writer or checkpoint process.
This is transparent to your application because data are always retrieved
from buffer cache.

Only committed data before the oldest uncommitted transaction can be removed
from the log. This occurs during a log backup in the FULL or BULK_LOGGED
recovery model or during a checkpoint in the SIMPLE model. Rather than
trying to take control of the checkpoint frequency, all you really need to
do is ensure your transactions are reasonably short and small.
Only the question is what is a reasonable
chunk? Perhaps i will create parameter for this so i can tune this.


If you are running in the SIMPLE recovery model and have many small
insert/update/delete transactions, I suggest you simply let SQL Server
truncate the log automatically. However, if you modify a lot of data (e.g.
millions of rows) in a single statement or transaction, you can use the
techniques Erland mentioned to keep the log size reasonable. In my
experience, the optimal amount is somewhat less that the amount of memory
allocated to SQL Server.

--
Hope this helps.

Dan Guzman
SQL Server MVP

"Hennie de Nooijer" <hd********@hotmail.com> wrote in message
news:19**************************@posting.google.c om...
* How does the process of updating, insert and deleting works with SQL
Server 2000 with respect to log cache, log file, buffer cache, commit,
checkpoint, etc?
What happens when?


There's a discussion of this in the Books Online
<architec.chm::/8_ar_sa_8unn.htm>.

How does this works?
In summary, all data modifications are
written to both the buffer cache and transaction log buffer cache. The log buffer is written first. Modified buffer cache pages are written to disk by either worker threads, the lazy writer or the checkpoint process. Worker threads write data when they need to wait for a read i/o anyway. The lazy writer process wakes up periodically and writes data in order to maintain a reasonable number of free cache buffers. The checkpoint process writes all dirty pages to disk periodically (about a minute by default) in order to
reduce startup recovery time.

Ok clear..

Log buffers are written as they become full, when you issue a COMMIT and
during a checkpoint.

It thought that a commit is not neccessary a write to the disk. A
commit is just a end mark of a transaction as far i can see in the
documentation. A checkpoint will write the transactions to the
database.
* As far as i can see now: i'm thinking of creating chunks of data of
1000 records with a checkpoint after the Query. SQL server has the
default of implicit transactions and so it will not need a commit.
Something like this?
* How do i create chunks of 1000 records automatically without
creating a identity field or something. Is there something like SELECT
NEXT 1000?


I'm not sure I understand what you are asking here. In the default
autocommit mode, each statement is a separate transaction.

It seems you are mostly concerned with transaction log space management.
The main consideration is to keep your transactions reasonably sized since only committed data can be removed from the log during a log backup (FULL or BULK_LOGGED recovery model) or checkpoint (SIMPLE model).

Yup this is what i meant. Only the question is what is a reasonable
chunk? Perhaps i will create parameter for this so i can tune this.

Jul 20 '05 #6
[posted and mailed, please reply in news]

Hennie de Nooijer (hd********@hotmail.com) writes:
Yeah well i have identity column which i could use for chunking the
records. Disadvantage is that it is an implementation depending on the
functionality of a column in the table. It's not what i want but i'll
have to deal with it.


One thing that I should have added is that you should see if the
clustered index is good for chunking. That can speed up the time
it takes for SQL Server to locate the rows quite a bit.

--
Erland Sommarskog, SQL Server MVP, so****@algonet.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinf...2000/books.asp
Jul 20 '05 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
by: thoidi | last post by:
Hi Guru, This is the work that inserting a new request into the tables tbl_request and tbl_worktime. The problem is it produced two records to each table. After this asp executed, it generated...
3
by: Vanchau Nguyen | last post by:
Hi all, Hoping someone can help us with this strange Innodb problem. At about 5:15 AM 2/2/04 (this morning), our slave server stopped accepting connections. The machine was pingable, but you...
7
by: Foxster | last post by:
in access i have two tables (tblplayer, tblpoints) in tblplayer i have field totalpoint (number) in tlbpoints i have field pointgame (number) the two files are related via idplayer now i made a...
16
by: StenKoll | last post by:
Help needed in order to create a register of stocks in a company. In accordance with local laws I need to give each individual share a number. I have accomplished this by establishing three tables...
0
by: | last post by:
I am updating MS access tables with data in an xml document. I create two dataset, one for existing data and one for new data. I fill the first dataset with the records from MS Access, the second...
2
by: Arsalan Ahmad | last post by:
Hi, May be I am a newbie, or may be i dont have that much insight in following systems ..i.e. why i have some confusions as below: In many websites, when search is performed on some keywords...
6
by: cek172 | last post by:
I am having trouble updating a table based on information from another table. I have two tables that aren't 1 to 1. The first table ("A") has approximately 100 records and three fields from this...
10
by: bobtehdog | last post by:
Column Consolidation. My problem is this. I have to sort each of the def1-def23 columns from the old table into the def1-def6 columns from the new table preserving the other data in the records....
11
by: Ed Dror | last post by:
Hi there, I'm using ASP.NET 2.0 and SQL Server 2005 with VS 2005 Pro. I have a Price page (my website require login) with GridView with the following columns PriceID, Amount, Approved,...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...
0
by: ryjfgjl | last post by:
In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.