473,708 Members | 2,400 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Strange performance issue with UPDATE FROM

Hello!

I have this piece of SQL code:

UPDATE a
SET Field1 = c.Field1
FROM a
INNER JOIN b ON a.GUID1 = b.GUID1
INNER JOIN c ON b.GUID2 = c.GUID2
WHERE c.Type = 1
AND @date BETWEEN b.DateFrom AND b.DateTo

This query takes hours to complete.

Now while trying to find out what's causing the poor performance (it
surely looks simple enough!) I've rewritten it to use temp tables:

SELECT a.GUID1, a.Field1, c.Type, b.DateFrom, b.DateTo INTO #temptable
FROM a
INNER JOIN b ON a.GUID1 = b.GUID1
INNER JOIN c ON b.GUID2 = c.GUID2
WHERE c.Type = 1
AND @date BETWEEN b.DateFrom AND b.DateTo

UPDATE a SET Field1 = subsel.Field1
FROM (SELECT * FROM #temptable) AS subsel
WHERE subsel.GUID1 = a.GUID1

Now it completes in 10 seconds.

My question is why? Am I wrong in saying that the two batches above
produce same results? Is there something I've missed about the UPDATE
FROM syntax? Why would the first query perform THAT poorly?

Table sizes:
a: 24k rows
b: 268k rows
c: 260k rows

GUIDs are of type uniqueidentifie r.

Any answers appreciated!

Regards,
// Richard

Jun 26 '07 #1
8 2528
Richard (na*******@gmai l.com) writes:
I have this piece of SQL code:

UPDATE a
SET Field1 = c.Field1
FROM a
INNER JOIN b ON a.GUID1 = b.GUID1
INNER JOIN c ON b.GUID2 = c.GUID2
WHERE c.Type = 1
AND @date BETWEEN b.DateFrom AND b.DateTo

This query takes hours to complete.

Now while trying to find out what's causing the poor performance (it
surely looks simple enough!) I've rewritten it to use temp tables:

SELECT a.GUID1, a.Field1, c.Type, b.DateFrom, b.DateTo INTO #temptable
FROM a
INNER JOIN b ON a.GUID1 = b.GUID1
INNER JOIN c ON b.GUID2 = c.GUID2
WHERE c.Type = 1
AND @date BETWEEN b.DateFrom AND b.DateTo

UPDATE a SET Field1 = subsel.Field1
FROM (SELECT * FROM #temptable) AS subsel
WHERE subsel.GUID1 = a.GUID1

Now it completes in 10 seconds.

My question is why? Am I wrong in saying that the two batches above
produce same results? Is there something I've missed about the UPDATE
FROM syntax? Why would the first query perform THAT poorly?
One problem with UPDATE FROM is that you can update the same row
several times if your join conditions are not unique. What happens if
you run:

UPDATE a
SET Field = (SELECT c.Field1
FROM c
JOIN b ON c.GUID2 = b.GUID2
WHERE a.GUID1 = b.GUID1
AND c.type = 1
AND @date BETWEEN b.DateFrom AND b.DateTo)

I'm most interested to know if the query succeds at all, or if it fails
with an error message. From the table sizes you have indicated, I would
expect an error, but I don't know how your tables are related.

As for the performance, investigating the query plan can give some ideas.
Without seeing query plans, the table definitions, the indexes etc,
it's difficult to say much useful.

Which version of SQL Server are you using?

Is @date a parameter to a stored procedure or a local variable?
--
Erland Sommarskog, SQL Server MVP, es****@sommarsk og.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pro...ads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinf...ons/books.mspx
Jun 26 '07 #2
On Tue, 26 Jun 2007 21:46:15 +0000 (UTC), Erland Sommarskog
<es****@sommars kog.sewrote:
>One problem with UPDATE FROM is that you can update the same row
several times if your join conditions are not unique. What happens if
you run:

UPDATE a
SET Field = (SELECT c.Field1
FROM c
JOIN b ON c.GUID2 = b.GUID2
WHERE a.GUID1 = b.GUID1
AND c.type = 1
AND @date BETWEEN b.DateFrom AND b.DateTo)
BE CAREFUL WITH THIS!!

One thing that could happen from this UPDATE is that it sets Field =
NULL for rows that are untouched by the UPDATEs in the original post.
That happens if there are rows in the table being UPDATEd that do not
have matches in the subquery. The FROM/JOIN prevents that in the
original versions. I don't know if any such non-matching rows exist,
but it certainly seems possible with the date range and type tests
dropping rows from the subquery.

I believe this query would show if the original UPDATEs using FROM
result in the same row updated more than once, without possible impact
on the data.

SELECT A.PrimaryKey, count(*)
FROM A
JOIN B
ON A.GUID1 = B.GUID1
JOIN C
ON B.GUID2 = B.GUID2
WHERE C.type = 1
AND @date BETWEEN B.DateFrom AND B.DateTo)
GROUP BY A.PrimaryKey
HAVING COUNT(*) 1

Roy Harvey
Beacon Falls, CT
Jun 27 '07 #3

I hope you know better than to use GUIDs in an RDBMS except for
replication, never to use reserved words like "date" for data element
names or vague names like "type" -- the basic ISO-11179 rules, etc.
You also seem to confuse fields and columns, but let's skip the signs
of poor SQL practices for now.

If you rewrite this in statement into Standard SQL, you will see if
the join returns multiple rows instead of a scalar value. That would
let us know that the schema has serious design problems. The illegal
syntax you used can do multiple updates on each row; talk to an old
Sybase programmer about this problem.

UPDATE A SET field1
= (SELECT C.field1 FROM B, C
WHERE A.guid1 = B.guid1
AND B.guid2 = C.guid2 AND C.somekind_type = 1
AND @my_date BETWEEN B.start_date AND B.end_date);

Jun 27 '07 #4
Roy Harvey (ro********@sne t.net) writes:
On Tue, 26 Jun 2007 21:46:15 +0000 (UTC), Erland Sommarskog
<es****@sommar skog.sewrote:
>>One problem with UPDATE FROM is that you can update the same row
several times if your join conditions are not unique. What happens if
you run:

UPDATE a
SET Field = (SELECT c.Field1
FROM c
JOIN b ON c.GUID2 = b.GUID2
WHERE a.GUID1 = b.GUID1
AND c.type = 1
AND @date BETWEEN b.DateFrom AND b.DateTo)

BE CAREFUL WITH THIS!!

One thing that could happen from this UPDATE is that it sets Field =
NULL for rows that are untouched by the UPDATEs in the original post.
Right, Roy. I didn't add a WHERE clause, because it was more intended as a
test to see if the query would work at all. But I should have included the
warning.


--
Erland Sommarskog, SQL Server MVP, es****@sommarsk og.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pro...ads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinf...ons/books.mspx
Jun 27 '07 #5
On Jun 27, 2:27 am, --CELKO-- <jcelko...@eart hlink.netwrote:
I hope you know better than to use GUIDs in an RDBMS except for
replication, never to use reserved words like "date" for data element
names or vague names like "type" -- the basic ISO-11179 rules, etc.
You also seem to confuse fields and columns, but let's skip the signs
of poor SQL practices for now.
Well, yes, I actually do know better. The columns, variables and
tables in the query are renamed as I don't want to post production
code on the Internet. Also excuse the mixup between fields and
columns, I'm not a native English speaker.

One big problem (as i see it, and I'm by no means a SQL expert) is
that the db in question uses uniqueidentifie r primary keys with
clustered indexes on those almost EVERYWHERE, and there is nothing I
can do to change that at the moment...Constr ucts like

FROM z
INNER JOIN a ON ..GUID = ..GUID
INNER JOIN b ON ..GUID = ..GUID
INNER JOIN c ON ..GUID = ..GUID
INNER JOIN d ON ..GUID = ..GUID
LEFT OUTER JOIN eON ..GUID = ..GUID
AND VERSION = (
SELECT MAX(VERSION) FROM f WHERE ..GUID = ..GUID)

make the queries run painfuly slow.

So the question is, is there ANYTHING I can do to optimize this type
of queries or is a redesign the only thing that would help?

@Erland:
I use MS SQL 2000 server and @date is a local variable =)

Jun 27 '07 #6
Richard (na*******@gmai l.com) writes:
One big problem (as i see it, and I'm by no means a SQL expert) is
that the db in question uses uniqueidentifie r primary keys with
clustered indexes on those almost EVERYWHERE, and there is nothing I
can do to change that at the moment...Constr ucts like
Clustered indexes on GUIDs requires a lot of skill in monitoring
fragmentation.

With the standard setup with a high fill factor, clustering on GUIDs is bad,
because you get page splits and fragmentation galore.

SQL Server MVP Greg Linwood suggested to me that clustering on GUIDs may
still be good for INSERT performance, if you create the indexes with a low
fill factor, say 50%. Now when you add new rows, there are good chance
that there is a hole to plug into. When the table starts to fill up, you
need to reindex again. But this strategy requires careful planning, and is
nothing for the armchair DBA.

If you are stuck with these clustered indexes, make you sure you set up
a reindexing job that runs regularly, and you should probably aim at
a lower fill factor. If not 50%, maybe 75-80%. It depends a bit how
big the INSERT frequency is. And use DBCC SHOWCONTIG to monitor
fragmentation.
FROM z
INNER JOIN a ON ..GUID = ..GUID
INNER JOIN b ON ..GUID = ..GUID
INNER JOIN c ON ..GUID = ..GUID
INNER JOIN d ON ..GUID = ..GUID
LEFT OUTER JOIN eON ..GUID = ..GUID
AND VERSION = (
SELECT MAX(VERSION) FROM f WHERE ..GUID = ..GUID)

make the queries run painfuly slow.

So the question is, is there ANYTHING I can do to optimize this type
of queries or is a redesign the only thing that would help?
With the information you have posted, it's impossible to tell. But
I would at least give defragmentation with DBCC DBREINDEX a chance
first if DBCC SHOWCONTIG show horrendeous numbers.

You could also consider adding covering non-clustered index on tables
where only a few columns of many are involved in the query.
--
Erland Sommarskog, SQL Server MVP, es****@sommarsk og.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pro...ads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinf...ons/books.mspx
Jun 27 '07 #7
>Well, yes, I actually do know better. The columns, variables and tables in the query are renamed as I don't want to post production code on the Internet. <<

Understood. But look at how many production DBs posted here have such
flaws.
>Also excuse the mix up between fields and columns, I'm not a native English speaker. <<
And that means that your English is probably better than a native
English speaker :) But my objection is not English; it is RDBMS
versus File Systems. A big problem I see when I teach SQL is students
using SQL as if it were a sequential file system -- no constraints,
improper data types. no DRI actions, depending on applications to do
what DDL should do, etc.
>One big problem (as i see it, and I'm by no means a SQL expert) is that the db in question uses uniqueidentifie r primary keys with clustered indexes on those almost EVERYWHERE,.. <<
You are doing very well for an amateur :) Yes, this is a major
problem and not just for performance. A uniqueidentifie r cannot be a
key in a properly designed RDBMS by definition -- it is an attribute
of the hardware and not the data model. You cannot verify it with a
trusted external source, so you have no data integrity. And it is
bitch to write them out without making an error.

The programmers who do this are trying to mimic pointer chains and
build a linked list in SQL. They missed the whole idea of RDBMS.
>So the question is, is there ANYTHING I can do to optimize this type of queries or is a redesign the only thing that would help? <<
Not much. Clustered indexes are not going to help with the random
nature of a uniqueidentifie r. Perhaps the best thing you can do is
kill the guy that did this to you and prevent him from coding again.

Jun 28 '07 #8
On Jun 26, 10:31 am, Richard <nasseg...@gmai l.comwrote:
Hello!

I have this piece of SQL code:

UPDATE a
SET Field1 = c.Field1
FROM a
INNER JOIN b ON a.GUID1 = b.GUID1
INNER JOIN c ON b.GUID2 = c.GUID2
WHERE c.Type = 1
AND @date BETWEEN b.DateFrom AND b.DateTo

This query takes hours to complete.

Now while trying to find out what's causing the poor performance (it
surely looks simple enough!) I've rewritten it to use temp tables:

SELECT a.GUID1, a.Field1, c.Type, b.DateFrom, b.DateTo INTO #temptable
FROM a
INNER JOIN b ON a.GUID1 = b.GUID1
INNER JOIN c ON b.GUID2 = c.GUID2
WHERE c.Type = 1
AND @date BETWEEN b.DateFrom AND b.DateTo

UPDATE a SET Field1 = subsel.Field1
FROM (SELECT * FROM #temptable) AS subsel
WHERE subsel.GUID1 = a.GUID1

Now it completes in 10 seconds.

My question is why? Am I wrong in saying that the two batches above
produce same results? Is there something I've missed about the UPDATE
FROM syntax? Why would the first query perform THAT poorly?

Table sizes:
a: 24k rows
b: 268k rows
c: 260k rows

GUIDs are of type uniqueidentifie r.

Any answers appreciated!

Regards,
// Richard
Not arguing with other party on correctness/standards etc., when you
create a temp table, you get statistics on it. So them optimizer has a
better estimate of number of rows to modify and may choose a better
plan.

http://sqlserver-tips.blogspot.com/

Jun 28 '07 #9

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
1957
by: Przemyslaw Mazur | last post by:
Hello. I use postgress to manage a small amount of data (ca. 25MB), but often changed. This is a mirror of several tables taken form four different databases. All data are updated once per hour (or sometimes on demand) Update of every table is closes within a transaction (constant data availability is most important):
5
1540
by: MGB | last post by:
I have a strange performance question hopefully someone can clarify for me. I take a production database and make a copy of it, called test, on the same instance on the same server both running at the same time. All the same things are running for each database and no one is using each database but me. From Query Analyzer I run a SQL against production three times in a row and it takes 1 minute 40 seconds on the last/best run. I then run...
2
1829
by: Shelby Cain | last post by:
I'm putting 8.0 through its paces and here are a few things I've noticed on the native win32 port running on my workstation (2.0g p4 w/256 megs of ram). Here is the output of "vacuum verbose item": ==================== INFO: vacuuming "public.item" INFO: "item": removed 246381 row versions in 24044 pages
0
1600
by: Josué Maldonado | last post by:
Hello list, I'm having a performance issue with a trigger before update, a single row update take this explain analyze update detpp set dpe_estado='C' where dpe_pk=25541 Index Scan using ix_detpp_pk on detpp (cost=0.00..6.01 rows=2 width=323) (actual time=0.063..0.070 rows=1 loops=1) Index Cond: (dpe_pk = 25541)
10
4418
by: Henk Ernst Blok | last post by:
Hi Posgres users/developers, Can anyone explain why PosgreSQL (version 7.4.5 on Linux) does a full table scan to compute a count(*) on a base table after a vacuum analyze has been done with no following updates that might have outdated any statistics. Strangly the explain command does give the correct number of tuples instantaniously from the catalog, as one would expect. Still the optimizer thinks it needs a full table scan to do count....
4
3270
by: Steph | last post by:
Hi - Trying to chase down a baffling performance issue. Our database has been running very slow lately. So we are performance tuning the database. In doing so, we created a copy of our production database. In that database, I changed one clustered index on a table to try to improve performance. I ran one query - saw a slight improvement - but saw "lazy spool" in the execution plan. I tried to change it back to the original index by...
10
4293
by: shsandeep | last post by:
The ETL application loaded around 3000 rows in 14 seconds in a Development database while it took 2 hours to load in a UAT database. UAT db is partitioned. Dev db is not partitioned. the application looks for existing rows in the table...if they already exist then it updates otherwise inserts them. The table is pretty large, around 6.5 million rows.
0
1481
by: Shades799 | last post by:
Hi All, I was wondering if any of you could help me with a very difficult problem that I am having. I have an ASP site that works with an Oracle database using an ADODB connection. This connection is stored in a .dll file. This site had been working fine. However recently we upgraded our Oracle database from 9i to 10g and ever since then we have been having serious performance problems with this site only. The website works fine...
3
4282
by: traceable1 | last post by:
I installed the SQL Server 2005 SP2 update 2 rollup on my 64-bit server and the performance has tanked! I installed rollup 3 on some of them, but that did not seem to help. I thought it was just a linked server performance issue, but my optimization started running today on one of the "update 2" instances and so far it's been running about 10 hours longer than it normally
0
8787
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
1
9060
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
9001
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
7921
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
6615
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
5939
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
4712
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
2
2508
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
2096
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.