473,396 Members | 1,797 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,396 software developers and data experts.

Update in SQL Server 2000 slow?

I have two tables:

T1 : Key as bigint, Data as char(20) - size: 61M records
T2 : Key as bigint, Data as char(20) - size: 5M records

T2 is the smaller, with 5 million records.

They both have clustered indexes on Key.

I want to do:

update T1 set Data = T2.Data
from T2
where T2.Key = T1.Key

The goal is to match Key values, and only update the data field of T1
if they match. SQL server seems to optimize this query fairly well,
doing an inner merge join on the Key fields, however, it then does a
Hash match to get the data fields and this is taking FOREVER. It
takes something like 40 mins to do the above query, where it seems to
me, the data could be updated much more efficiently. I would expect
to see just a merge and update, like I would see in the following
query:

update T1 set Data = [someconstantdata]
from T2
where T2.Key = T1.Key and T2.Data = [someconstantdata]

The above works VERY quickly, and if I were to perform the above query
5 mil times(assuming that my data is completely unique in T2 and I
would need to) it would finish very quickly, much sooner than the
previous query. Why won't SQL server just match these up while it is
merging the data and update in one step? Can I make it do this? If I
extracted the data in sorted order into a flat file, I could write a
program in ten minutes to merge the two tables, and update in one
step, and it would fly through this, but I imagine that SQL server is
capable of doing it, and I am just missing it.

Any advice would be GREATLY appreciated!
Jul 20 '05 #1
3 10057
Dan Berlin (db*****@alum.rpi.edu) writes:
cI have two tables:

T1 : Key as bigint, Data as char(20) - size: 61M records
T2 : Key as bigint, Data as char(20) - size: 5M records

T2 is the smaller, with 5 million records.

They both have clustered indexes on Key.

I want to do:

update T1 set Data = T2.Data
from T2
where T2.Key = T1.Key

The goal is to match Key values, and only update the data field of T1
if they match. SQL server seems to optimize this query fairly well,
doing an inner merge join on the Key fields, however, it then does a
Hash match to get the data fields and this is taking FOREVER. It
takes something like 40 mins to do the above query, where it seems to
me, the data could be updated much more efficiently. I would expect
to see just a merge and update, like I would see in the following
query:

update T1 set Data = [someconstantdata]
from T2
where T2.Key = T1.Key and T2.Data = [someconstantdata]


This query is quite different. Here SQL Server can scan T2, and for
every row where Data has a matching value it can look up the key in T1.
Since SQL Server has statistics about the data, it can tell how many
hits the condition on T2.Data will get.

In your first query, you are not restricting T2, so you will have
to scan all. A nested loop join would mean 5 million lookups in T1 -
probably not good. I would expect merge join to be possible, but that is
still a scan of both tables.

First I would add the condition:

WHERE (T1.Data <> T2.Data OR
T1.Data IS NULL AND T2.Data IS NOT NULL OR
T1.Data IS NOT NULL AND T2.Data IS NULL)

So that you actually update only rows you need to update.

If there are plenty of other columns in the table, I would add non-clustered
indexes on (Key1, Data) for both tables, since theses indexes would
cover the query.

--
Erland Sommarskog, SQL Server MVP, so****@algonet.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinf...2000/books.asp
Jul 20 '05 #2
Erland Sommarskog <so****@algonet.se> wrote in message news:<Xn*********************@127.0.0.1>...
Dan Berlin (db*****@alum.rpi.edu) writes:
cI have two tables:

T1 : Key as bigint, Data as char(20) - size: 61M records
T2 : Key as bigint, Data as char(20) - size: 5M records

T2 is the smaller, with 5 million records.

They both have clustered indexes on Key.

I want to do:

update T1 set Data = T2.Data
from T2
where T2.Key = T1.Key

The goal is to match Key values, and only update the data field of T1
if they match. SQL server seems to optimize this query fairly well,
doing an inner merge join on the Key fields, however, it then does a
Hash match to get the data fields and this is taking FOREVER. It
takes something like 40 mins to do the above query, where it seems to
me, the data could be updated much more efficiently. I would expect
to see just a merge and update, like I would see in the following
query:

update T1 set Data = [someconstantdata]
from T2
where T2.Key = T1.Key and T2.Data = [someconstantdata]


This query is quite different. Here SQL Server can scan T2, and for
every row where Data has a matching value it can look up the key in T1.
Since SQL Server has statistics about the data, it can tell how many
hits the condition on T2.Data will get.

In your first query, you are not restricting T2, so you will have
to scan all. A nested loop join would mean 5 million lookups in T1 -
probably not good. I would expect merge join to be possible, but that is
still a scan of both tables.

First I would add the condition:

WHERE (T1.Data <> T2.Data OR
T1.Data IS NULL AND T2.Data IS NOT NULL OR
T1.Data IS NOT NULL AND T2.Data IS NULL)

So that you actually update only rows you need to update.

If there are plenty of other columns in the table, I would add non-clustered
indexes on (Key1, Data) for both tables, since theses indexes would
cover the query.


This was very helpful, thank you!

However, there is still a large Hash Match/Aggregate being performed
that requires 45%(for a T2 of 2.5M records) of the resources for the
query. A complete table scan of the larger table consists of 34% of
the query, the merge join is 19% and the Hash Match is 45%,
effectively doubling the time the query takes to run. The larger my
T2 table is, the longer the hash takes on a scale that is increasing
faster than linearly(exponential? not sure). The hash seems to be
doing the following: HASH: bmk1000, RESIDUAL: (bmk1000=bmk1000)
(T2.Data = ANY(T2.Data))
This is from the query analyzer's estimated execution plan. Do you
know how I can avoid this hash, or why it is necessary? It really
really slows down the query to an unacceptable level.

Thanks again for the help!
Dan Berlin
Jul 20 '05 #3
Dan Berlin (db*****@alum.rpi.edu) writes:
However, there is still a large Hash Match/Aggregate being performed
that requires 45%(for a T2 of 2.5M records) of the resources for the
query. A complete table scan of the larger table consists of 34% of
the query, the merge join is 19% and the Hash Match is 45%,
effectively doubling the time the query takes to run. The larger my
T2 table is, the longer the hash takes on a scale that is increasing
faster than linearly(exponential? not sure). The hash seems to be
doing the following: HASH: bmk1000, RESIDUAL: (bmk1000=bmk1000)
(T2.Data = ANY(T2.Data))
This is from the query analyzer's estimated execution plan. Do you
know how I can avoid this hash, or why it is necessary? It really
really slows down the query to an unacceptable level.


Again, without access to the tables, it is difficult to give very good
suggestions. Query tuning is a lot of hands on.

But if the hashing is a bottleneck, and is growing more than linearly,
one idea is to try to run the update in chunks. Take a reasonbly-sized
interval of the key value at a time.

The hashing is on Data, I would guess to locate the rows that needs
updating. Hashing is probably better than a nested-loop join.

Could you post:

o CREATE TABLE and CREATE INDEX statements for your tables?
o The query as it looks now?
o The query plan you get?

This would leave me a little less in the dark.
--
Erland Sommarskog, SQL Server MVP, so****@algonet.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinf...2000/books.asp
Jul 20 '05 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: Jenny | last post by:
Hi! I wonder how to use conditions in the inserted table(in a insert/update) trigger? The inserted table contain all the rows that have been updated or inserted (for an update/insert trigger),...
17
by: Felix | last post by:
Dear Sql Server experts: First off, I am no sql server expert :) A few months ago I put a database into a production environment. Recently, It was brought to my attention that a particular...
1
by: Gent | last post by:
am using FOR UPDATE triggers to audit a table that has 67 fields. My problem is that this slows down the system significantly. I have narrowed down the problem to the size (Lines of code) that need...
10
by: R Camarda | last post by:
I have the following statement that takes quite a long time. Longest of any of my SQL statment updates. UPDATE F_REGISTRATION_STD_SESSION SET PREVIOUS_YEAR_SESSION_ID = ( SELECT...
27
by: VK | last post by:
<http://www.jibbering.com/faq/#FAQ3_2> The parts where update, replacement or add-on is needed are in <update> tag. 3.2 What online resources are available? Javascript FAQ sites, please...
29
by: Geoff Jones | last post by:
Hi All I hope you'll forgive me for posting this here (I've also posted to ado site but with no response so far) as I'm urgently after a solution. Can anybody help me? I'm updating a table on...
6
by: '~=_Slawek_=~' | last post by:
I have problem with SQL update. Sometimes it can take 2-10 seconds. I need to make this update every time page is opened - ASYNCHRONOUSLY. I have read forums, php.net etc about running php code...
3
by: Chris | last post by:
Don't know if there is a simple solution for this one or not. When running SQL server on a machine with 2000 loaded and the complete SQL package I don't have any issues. Now I'm trying to login...
3
by: traceable1 | last post by:
I installed the SQL Server 2005 SP2 update 2 rollup on my 64-bit server and the performance has tanked! I installed rollup 3 on some of them, but that did not seem to help. I thought it...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.