472,143 Members | 1,350 Online
Bytes | Software Development & Data Engineering Community
Post +

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,143 software developers and data experts.

Strange performance issue with UPDATE FROM

Hello!

I have this piece of SQL code:

UPDATE a
SET Field1 = c.Field1
FROM a
INNER JOIN b ON a.GUID1 = b.GUID1
INNER JOIN c ON b.GUID2 = c.GUID2
WHERE c.Type = 1
AND @date BETWEEN b.DateFrom AND b.DateTo

This query takes hours to complete.

Now while trying to find out what's causing the poor performance (it
surely looks simple enough!) I've rewritten it to use temp tables:

SELECT a.GUID1, a.Field1, c.Type, b.DateFrom, b.DateTo INTO #temptable
FROM a
INNER JOIN b ON a.GUID1 = b.GUID1
INNER JOIN c ON b.GUID2 = c.GUID2
WHERE c.Type = 1
AND @date BETWEEN b.DateFrom AND b.DateTo

UPDATE a SET Field1 = subsel.Field1
FROM (SELECT * FROM #temptable) AS subsel
WHERE subsel.GUID1 = a.GUID1

Now it completes in 10 seconds.

My question is why? Am I wrong in saying that the two batches above
produce same results? Is there something I've missed about the UPDATE
FROM syntax? Why would the first query perform THAT poorly?

Table sizes:
a: 24k rows
b: 268k rows
c: 260k rows

GUIDs are of type uniqueidentifier.

Any answers appreciated!

Regards,
// Richard

Jun 26 '07 #1
8 2443
Richard (na*******@gmail.com) writes:
I have this piece of SQL code:

UPDATE a
SET Field1 = c.Field1
FROM a
INNER JOIN b ON a.GUID1 = b.GUID1
INNER JOIN c ON b.GUID2 = c.GUID2
WHERE c.Type = 1
AND @date BETWEEN b.DateFrom AND b.DateTo

This query takes hours to complete.

Now while trying to find out what's causing the poor performance (it
surely looks simple enough!) I've rewritten it to use temp tables:

SELECT a.GUID1, a.Field1, c.Type, b.DateFrom, b.DateTo INTO #temptable
FROM a
INNER JOIN b ON a.GUID1 = b.GUID1
INNER JOIN c ON b.GUID2 = c.GUID2
WHERE c.Type = 1
AND @date BETWEEN b.DateFrom AND b.DateTo

UPDATE a SET Field1 = subsel.Field1
FROM (SELECT * FROM #temptable) AS subsel
WHERE subsel.GUID1 = a.GUID1

Now it completes in 10 seconds.

My question is why? Am I wrong in saying that the two batches above
produce same results? Is there something I've missed about the UPDATE
FROM syntax? Why would the first query perform THAT poorly?
One problem with UPDATE FROM is that you can update the same row
several times if your join conditions are not unique. What happens if
you run:

UPDATE a
SET Field = (SELECT c.Field1
FROM c
JOIN b ON c.GUID2 = b.GUID2
WHERE a.GUID1 = b.GUID1
AND c.type = 1
AND @date BETWEEN b.DateFrom AND b.DateTo)

I'm most interested to know if the query succeds at all, or if it fails
with an error message. From the table sizes you have indicated, I would
expect an error, but I don't know how your tables are related.

As for the performance, investigating the query plan can give some ideas.
Without seeing query plans, the table definitions, the indexes etc,
it's difficult to say much useful.

Which version of SQL Server are you using?

Is @date a parameter to a stored procedure or a local variable?
--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pro...ads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinf...ons/books.mspx
Jun 26 '07 #2
On Tue, 26 Jun 2007 21:46:15 +0000 (UTC), Erland Sommarskog
<es****@sommarskog.sewrote:
>One problem with UPDATE FROM is that you can update the same row
several times if your join conditions are not unique. What happens if
you run:

UPDATE a
SET Field = (SELECT c.Field1
FROM c
JOIN b ON c.GUID2 = b.GUID2
WHERE a.GUID1 = b.GUID1
AND c.type = 1
AND @date BETWEEN b.DateFrom AND b.DateTo)
BE CAREFUL WITH THIS!!

One thing that could happen from this UPDATE is that it sets Field =
NULL for rows that are untouched by the UPDATEs in the original post.
That happens if there are rows in the table being UPDATEd that do not
have matches in the subquery. The FROM/JOIN prevents that in the
original versions. I don't know if any such non-matching rows exist,
but it certainly seems possible with the date range and type tests
dropping rows from the subquery.

I believe this query would show if the original UPDATEs using FROM
result in the same row updated more than once, without possible impact
on the data.

SELECT A.PrimaryKey, count(*)
FROM A
JOIN B
ON A.GUID1 = B.GUID1
JOIN C
ON B.GUID2 = B.GUID2
WHERE C.type = 1
AND @date BETWEEN B.DateFrom AND B.DateTo)
GROUP BY A.PrimaryKey
HAVING COUNT(*) 1

Roy Harvey
Beacon Falls, CT
Jun 27 '07 #3

I hope you know better than to use GUIDs in an RDBMS except for
replication, never to use reserved words like "date" for data element
names or vague names like "type" -- the basic ISO-11179 rules, etc.
You also seem to confuse fields and columns, but let's skip the signs
of poor SQL practices for now.

If you rewrite this in statement into Standard SQL, you will see if
the join returns multiple rows instead of a scalar value. That would
let us know that the schema has serious design problems. The illegal
syntax you used can do multiple updates on each row; talk to an old
Sybase programmer about this problem.

UPDATE A SET field1
= (SELECT C.field1 FROM B, C
WHERE A.guid1 = B.guid1
AND B.guid2 = C.guid2 AND C.somekind_type = 1
AND @my_date BETWEEN B.start_date AND B.end_date);

Jun 27 '07 #4
Roy Harvey (ro********@snet.net) writes:
On Tue, 26 Jun 2007 21:46:15 +0000 (UTC), Erland Sommarskog
<es****@sommarskog.sewrote:
>>One problem with UPDATE FROM is that you can update the same row
several times if your join conditions are not unique. What happens if
you run:

UPDATE a
SET Field = (SELECT c.Field1
FROM c
JOIN b ON c.GUID2 = b.GUID2
WHERE a.GUID1 = b.GUID1
AND c.type = 1
AND @date BETWEEN b.DateFrom AND b.DateTo)

BE CAREFUL WITH THIS!!

One thing that could happen from this UPDATE is that it sets Field =
NULL for rows that are untouched by the UPDATEs in the original post.
Right, Roy. I didn't add a WHERE clause, because it was more intended as a
test to see if the query would work at all. But I should have included the
warning.


--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pro...ads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinf...ons/books.mspx
Jun 27 '07 #5
On Jun 27, 2:27 am, --CELKO-- <jcelko...@earthlink.netwrote:
I hope you know better than to use GUIDs in an RDBMS except for
replication, never to use reserved words like "date" for data element
names or vague names like "type" -- the basic ISO-11179 rules, etc.
You also seem to confuse fields and columns, but let's skip the signs
of poor SQL practices for now.
Well, yes, I actually do know better. The columns, variables and
tables in the query are renamed as I don't want to post production
code on the Internet. Also excuse the mixup between fields and
columns, I'm not a native English speaker.

One big problem (as i see it, and I'm by no means a SQL expert) is
that the db in question uses uniqueidentifier primary keys with
clustered indexes on those almost EVERYWHERE, and there is nothing I
can do to change that at the moment...Constructs like

FROM z
INNER JOIN a ON ..GUID = ..GUID
INNER JOIN b ON ..GUID = ..GUID
INNER JOIN c ON ..GUID = ..GUID
INNER JOIN d ON ..GUID = ..GUID
LEFT OUTER JOIN eON ..GUID = ..GUID
AND VERSION = (
SELECT MAX(VERSION) FROM f WHERE ..GUID = ..GUID)

make the queries run painfuly slow.

So the question is, is there ANYTHING I can do to optimize this type
of queries or is a redesign the only thing that would help?

@Erland:
I use MS SQL 2000 server and @date is a local variable =)

Jun 27 '07 #6
Richard (na*******@gmail.com) writes:
One big problem (as i see it, and I'm by no means a SQL expert) is
that the db in question uses uniqueidentifier primary keys with
clustered indexes on those almost EVERYWHERE, and there is nothing I
can do to change that at the moment...Constructs like
Clustered indexes on GUIDs requires a lot of skill in monitoring
fragmentation.

With the standard setup with a high fill factor, clustering on GUIDs is bad,
because you get page splits and fragmentation galore.

SQL Server MVP Greg Linwood suggested to me that clustering on GUIDs may
still be good for INSERT performance, if you create the indexes with a low
fill factor, say 50%. Now when you add new rows, there are good chance
that there is a hole to plug into. When the table starts to fill up, you
need to reindex again. But this strategy requires careful planning, and is
nothing for the armchair DBA.

If you are stuck with these clustered indexes, make you sure you set up
a reindexing job that runs regularly, and you should probably aim at
a lower fill factor. If not 50%, maybe 75-80%. It depends a bit how
big the INSERT frequency is. And use DBCC SHOWCONTIG to monitor
fragmentation.
FROM z
INNER JOIN a ON ..GUID = ..GUID
INNER JOIN b ON ..GUID = ..GUID
INNER JOIN c ON ..GUID = ..GUID
INNER JOIN d ON ..GUID = ..GUID
LEFT OUTER JOIN eON ..GUID = ..GUID
AND VERSION = (
SELECT MAX(VERSION) FROM f WHERE ..GUID = ..GUID)

make the queries run painfuly slow.

So the question is, is there ANYTHING I can do to optimize this type
of queries or is a redesign the only thing that would help?
With the information you have posted, it's impossible to tell. But
I would at least give defragmentation with DBCC DBREINDEX a chance
first if DBCC SHOWCONTIG show horrendeous numbers.

You could also consider adding covering non-clustered index on tables
where only a few columns of many are involved in the query.
--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pro...ads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinf...ons/books.mspx
Jun 27 '07 #7
>Well, yes, I actually do know better. The columns, variables and tables in the query are renamed as I don't want to post production code on the Internet. <<

Understood. But look at how many production DBs posted here have such
flaws.
>Also excuse the mix up between fields and columns, I'm not a native English speaker. <<
And that means that your English is probably better than a native
English speaker :) But my objection is not English; it is RDBMS
versus File Systems. A big problem I see when I teach SQL is students
using SQL as if it were a sequential file system -- no constraints,
improper data types. no DRI actions, depending on applications to do
what DDL should do, etc.
>One big problem (as i see it, and I'm by no means a SQL expert) is that the db in question uses uniqueidentifier primary keys with clustered indexes on those almost EVERYWHERE,.. <<
You are doing very well for an amateur :) Yes, this is a major
problem and not just for performance. A uniqueidentifier cannot be a
key in a properly designed RDBMS by definition -- it is an attribute
of the hardware and not the data model. You cannot verify it with a
trusted external source, so you have no data integrity. And it is
bitch to write them out without making an error.

The programmers who do this are trying to mimic pointer chains and
build a linked list in SQL. They missed the whole idea of RDBMS.
>So the question is, is there ANYTHING I can do to optimize this type of queries or is a redesign the only thing that would help? <<
Not much. Clustered indexes are not going to help with the random
nature of a uniqueidentifier. Perhaps the best thing you can do is
kill the guy that did this to you and prevent him from coding again.

Jun 28 '07 #8
On Jun 26, 10:31 am, Richard <nasseg...@gmail.comwrote:
Hello!

I have this piece of SQL code:

UPDATE a
SET Field1 = c.Field1
FROM a
INNER JOIN b ON a.GUID1 = b.GUID1
INNER JOIN c ON b.GUID2 = c.GUID2
WHERE c.Type = 1
AND @date BETWEEN b.DateFrom AND b.DateTo

This query takes hours to complete.

Now while trying to find out what's causing the poor performance (it
surely looks simple enough!) I've rewritten it to use temp tables:

SELECT a.GUID1, a.Field1, c.Type, b.DateFrom, b.DateTo INTO #temptable
FROM a
INNER JOIN b ON a.GUID1 = b.GUID1
INNER JOIN c ON b.GUID2 = c.GUID2
WHERE c.Type = 1
AND @date BETWEEN b.DateFrom AND b.DateTo

UPDATE a SET Field1 = subsel.Field1
FROM (SELECT * FROM #temptable) AS subsel
WHERE subsel.GUID1 = a.GUID1

Now it completes in 10 seconds.

My question is why? Am I wrong in saying that the two batches above
produce same results? Is there something I've missed about the UPDATE
FROM syntax? Why would the first query perform THAT poorly?

Table sizes:
a: 24k rows
b: 268k rows
c: 260k rows

GUIDs are of type uniqueidentifier.

Any answers appreciated!

Regards,
// Richard
Not arguing with other party on correctness/standards etc., when you
create a temp table, you get statistics on it. So them optimizer has a
better estimate of number of rows to modify and may choose a better
plan.

http://sqlserver-tips.blogspot.com/

Jun 28 '07 #9

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

2 posts views Thread by Przemyslaw Mazur | last post: by
5 posts views Thread by MGB | last post: by
reply views Thread by Josué Maldonado | last post: by
10 posts views Thread by Henk Ernst Blok | last post: by
10 posts views Thread by shsandeep | last post: by
reply views Thread by leo001 | last post: by

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.