By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
431,934 Members | 1,804 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 431,934 IT Pros & Developers. It's quick & easy.

Update in SQL Server 2000 slow?

P: n/a
I have two tables:

T1 : Key as bigint, Data as char(20) - size: 61M records
T2 : Key as bigint, Data as char(20) - size: 5M records

T2 is the smaller, with 5 million records.

They both have clustered indexes on Key.

I want to do:

update T1 set Data = T2.Data
from T2
where T2.Key = T1.Key

The goal is to match Key values, and only update the data field of T1
if they match. SQL server seems to optimize this query fairly well,
doing an inner merge join on the Key fields, however, it then does a
Hash match to get the data fields and this is taking FOREVER. It
takes something like 40 mins to do the above query, where it seems to
me, the data could be updated much more efficiently. I would expect
to see just a merge and update, like I would see in the following
query:

update T1 set Data = [someconstantdata]
from T2
where T2.Key = T1.Key and T2.Data = [someconstantdata]

The above works VERY quickly, and if I were to perform the above query
5 mil times(assuming that my data is completely unique in T2 and I
would need to) it would finish very quickly, much sooner than the
previous query. Why won't SQL server just match these up while it is
merging the data and update in one step? Can I make it do this? If I
extracted the data in sorted order into a flat file, I could write a
program in ten minutes to merge the two tables, and update in one
step, and it would fly through this, but I imagine that SQL server is
capable of doing it, and I am just missing it.

Any advice would be GREATLY appreciated!
Jul 20 '05 #1
Share this Question
Share on Google+
3 Replies


P: n/a
Dan Berlin (db*****@alum.rpi.edu) writes:
cI have two tables:

T1 : Key as bigint, Data as char(20) - size: 61M records
T2 : Key as bigint, Data as char(20) - size: 5M records

T2 is the smaller, with 5 million records.

They both have clustered indexes on Key.

I want to do:

update T1 set Data = T2.Data
from T2
where T2.Key = T1.Key

The goal is to match Key values, and only update the data field of T1
if they match. SQL server seems to optimize this query fairly well,
doing an inner merge join on the Key fields, however, it then does a
Hash match to get the data fields and this is taking FOREVER. It
takes something like 40 mins to do the above query, where it seems to
me, the data could be updated much more efficiently. I would expect
to see just a merge and update, like I would see in the following
query:

update T1 set Data = [someconstantdata]
from T2
where T2.Key = T1.Key and T2.Data = [someconstantdata]


This query is quite different. Here SQL Server can scan T2, and for
every row where Data has a matching value it can look up the key in T1.
Since SQL Server has statistics about the data, it can tell how many
hits the condition on T2.Data will get.

In your first query, you are not restricting T2, so you will have
to scan all. A nested loop join would mean 5 million lookups in T1 -
probably not good. I would expect merge join to be possible, but that is
still a scan of both tables.

First I would add the condition:

WHERE (T1.Data <> T2.Data OR
T1.Data IS NULL AND T2.Data IS NOT NULL OR
T1.Data IS NOT NULL AND T2.Data IS NULL)

So that you actually update only rows you need to update.

If there are plenty of other columns in the table, I would add non-clustered
indexes on (Key1, Data) for both tables, since theses indexes would
cover the query.

--
Erland Sommarskog, SQL Server MVP, so****@algonet.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinf...2000/books.asp
Jul 20 '05 #2

P: n/a
Erland Sommarskog <so****@algonet.se> wrote in message news:<Xn*********************@127.0.0.1>...
Dan Berlin (db*****@alum.rpi.edu) writes:
cI have two tables:

T1 : Key as bigint, Data as char(20) - size: 61M records
T2 : Key as bigint, Data as char(20) - size: 5M records

T2 is the smaller, with 5 million records.

They both have clustered indexes on Key.

I want to do:

update T1 set Data = T2.Data
from T2
where T2.Key = T1.Key

The goal is to match Key values, and only update the data field of T1
if they match. SQL server seems to optimize this query fairly well,
doing an inner merge join on the Key fields, however, it then does a
Hash match to get the data fields and this is taking FOREVER. It
takes something like 40 mins to do the above query, where it seems to
me, the data could be updated much more efficiently. I would expect
to see just a merge and update, like I would see in the following
query:

update T1 set Data = [someconstantdata]
from T2
where T2.Key = T1.Key and T2.Data = [someconstantdata]


This query is quite different. Here SQL Server can scan T2, and for
every row where Data has a matching value it can look up the key in T1.
Since SQL Server has statistics about the data, it can tell how many
hits the condition on T2.Data will get.

In your first query, you are not restricting T2, so you will have
to scan all. A nested loop join would mean 5 million lookups in T1 -
probably not good. I would expect merge join to be possible, but that is
still a scan of both tables.

First I would add the condition:

WHERE (T1.Data <> T2.Data OR
T1.Data IS NULL AND T2.Data IS NOT NULL OR
T1.Data IS NOT NULL AND T2.Data IS NULL)

So that you actually update only rows you need to update.

If there are plenty of other columns in the table, I would add non-clustered
indexes on (Key1, Data) for both tables, since theses indexes would
cover the query.


This was very helpful, thank you!

However, there is still a large Hash Match/Aggregate being performed
that requires 45%(for a T2 of 2.5M records) of the resources for the
query. A complete table scan of the larger table consists of 34% of
the query, the merge join is 19% and the Hash Match is 45%,
effectively doubling the time the query takes to run. The larger my
T2 table is, the longer the hash takes on a scale that is increasing
faster than linearly(exponential? not sure). The hash seems to be
doing the following: HASH: bmk1000, RESIDUAL: (bmk1000=bmk1000)
(T2.Data = ANY(T2.Data))
This is from the query analyzer's estimated execution plan. Do you
know how I can avoid this hash, or why it is necessary? It really
really slows down the query to an unacceptable level.

Thanks again for the help!
Dan Berlin
Jul 20 '05 #3

P: n/a
Dan Berlin (db*****@alum.rpi.edu) writes:
However, there is still a large Hash Match/Aggregate being performed
that requires 45%(for a T2 of 2.5M records) of the resources for the
query. A complete table scan of the larger table consists of 34% of
the query, the merge join is 19% and the Hash Match is 45%,
effectively doubling the time the query takes to run. The larger my
T2 table is, the longer the hash takes on a scale that is increasing
faster than linearly(exponential? not sure). The hash seems to be
doing the following: HASH: bmk1000, RESIDUAL: (bmk1000=bmk1000)
(T2.Data = ANY(T2.Data))
This is from the query analyzer's estimated execution plan. Do you
know how I can avoid this hash, or why it is necessary? It really
really slows down the query to an unacceptable level.


Again, without access to the tables, it is difficult to give very good
suggestions. Query tuning is a lot of hands on.

But if the hashing is a bottleneck, and is growing more than linearly,
one idea is to try to run the update in chunks. Take a reasonbly-sized
interval of the key value at a time.

The hashing is on Data, I would guess to locate the rows that needs
updating. Hashing is probably better than a nested-loop join.

Could you post:

o CREATE TABLE and CREATE INDEX statements for your tables?
o The query as it looks now?
o The query plan you get?

This would leave me a little less in the dark.
--
Erland Sommarskog, SQL Server MVP, so****@algonet.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinf...2000/books.asp
Jul 20 '05 #4

This discussion thread is closed

Replies have been disabled for this discussion.