Bizarre slow query problem (again)

wizofaus

I previously posted about a problem where it seemed that changing the
case of the word "BY" in a SELECT query was causing it to run much much
faster.

Now I've hit the same thing again, where basically almost any change I
make to how the query is executed (so that it still performs the same
function) causes the performance to jump from a dismal 7 or 8 seconds
to instantaneous. It's a very simple query of the form:

SELECT Min(MyValue) FROM MyTable WHERE MyKey = @0

which was running fine until a moment ago, when it suddently started
running hopelessly slowly. If change anything in the query to
lowercase (or the Min to uppercase), it runs fine again. Last time
someone suggested something about a bad plan being cached, and after a
bit of research I found the commands DBCC DROPCLEANBUFFERS and DBCC
FREEPROCCACHE. Sure enough, after running these, the query started
running fine again. The question is

a) why is this happening? Is it a bug in my code, or in SQL server?
b) is it worth detecting it and fixing it automatically? I.e, should I
put some code in that notices that a query is running far too slowly,
then runs "DBCC DROPCLEANBUFFERS; DBCC FREEPROCCACHE"? Or will that
cause other problems?

Thanks

Jan 11 '07 #1

Subscribe Post Reply

5460

Uri Dimant

Hi
http://blogs.msdn.com/khen1234/archi...02/424228.aspx

<wi******@hotmail.comwrote in message
news:11**********************@77g2000hsv.googlegro ups.com...

>I previously posted about a problem where it seemed that changing the
case of the word "BY" in a SELECT query was causing it to run much much
faster.

Now I've hit the same thing again, where basically almost any change I
make to how the query is executed (so that it still performs the same
function) causes the performance to jump from a dismal 7 or 8 seconds
to instantaneous. It's a very simple query of the form:

SELECT Min(MyValue) FROM MyTable WHERE MyKey = @0

which was running fine until a moment ago, when it suddently started
running hopelessly slowly. If change anything in the query to
lowercase (or the Min to uppercase), it runs fine again. Last time
someone suggested something about a bad plan being cached, and after a
bit of research I found the commands DBCC DROPCLEANBUFFERS and DBCC
FREEPROCCACHE. Sure enough, after running these, the query started
running fine again. The question is

a) why is this happening? Is it a bug in my code, or in SQL server?
b) is it worth detecting it and fixing it automatically? I.e, should I
put some code in that notices that a query is running far too slowly,
then runs "DBCC DROPCLEANBUFFERS; DBCC FREEPROCCACHE"? Or will that
cause other problems?

Thanks

Jan 11 '07 #2

wizofaus

Uri Dimant wrote:

Hi
http://blogs.msdn.com/khen1234/archi...02/424228.aspx

Thanks for that...amazingly enough it turned that that was exactly my
problem, although I'm using ad-hoc queries rather than stored procs. I
did some more testing, and it turned out that it was because it was
executing the same query twice, the first time with an atypical
parameter value, and the second time with a more typical one, that the
query was running so slowly. That is, executing

SELECT Min(MyValue) FROM MyTable WHERE MyKey = @0, with @0 = 999
followed by
SELECT Min(MyValue) FROM MyTable WHERE MyKey = @0, with @0 = 123

Caused the second query to run absurdly slowly, because in the first
case only very few rows in the table had MyKey = 999 whereas almost
every row had MyKey = 123. After doing a DBCC FREEPROCCACHE and
swapping the queries around, they both ran fine.

In the end I ended up de-parameterizing the query just for this case,
but now I'm worried - how can I be sure that my other queries won't
suffer from the same problem? Should I never use parameters because of
this possibility?

Jan 11 '07 #3

Uri Dimant

In the end I ended up de-parameterizing the query just for this case,

but now I'm worried - how can I be sure that my other queries won't
suffer from the same problem? Should I never use parameters because of
this possibility?

An ability using parameters is very powerful , don't afraid using parameters
, just test it carefuly

<wi******@hotmail.comwrote in message
news:11**********************@i56g2000hsf.googlegr oups.com...

Uri Dimant wrote:
>Hi
http://blogs.msdn.com/khen1234/archi...02/424228.aspx

Thanks for that...amazingly enough it turned that that was exactly my
problem, although I'm using ad-hoc queries rather than stored procs. I
did some more testing, and it turned out that it was because it was
executing the same query twice, the first time with an atypical
parameter value, and the second time with a more typical one, that the
query was running so slowly. That is, executing

SELECT Min(MyValue) FROM MyTable WHERE MyKey = @0, with @0 = 999
followed by
SELECT Min(MyValue) FROM MyTable WHERE MyKey = @0, with @0 = 123

Caused the second query to run absurdly slowly, because in the first
case only very few rows in the table had MyKey = 999 whereas almost
every row had MyKey = 123. After doing a DBCC FREEPROCCACHE and
swapping the queries around, they both ran fine.

In the end I ended up de-parameterizing the query just for this case,
but now I'm worried - how can I be sure that my other queries won't
suffer from the same problem? Should I never use parameters because of
this possibility?

Jan 11 '07 #4

wizofaus

Uri Dimant wrote:

In the end I ended up de-parameterizing the query just for this case,
but now I'm worried - how can I be sure that my other queries won't
suffer from the same problem? Should I never use parameters because of
this possibility?
An ability using parameters is very powerful , don't afraid using parameters
, just test it carefuly

Sure, except that the content of the database is out of my control -
this particular scenario (where nearly all the records matched a
particular key, but the query was first run against a different key)
could easily arrise in a production environment. More to the point,
I've seen no evidence that I'm getting any performance benefits from
using parameterized queries.
I suspect I will at least add a configuration option to avoid
parameterized queries (relatively straightforward, as I have a layer of
code that handles query parameters) if I see a problem like this again.

Jan 11 '07 #5

Bob Barrows [MVP]

wi******@hotmail.com wrote:

Uri Dimant wrote:

>>In the end I ended up de-parameterizing the query just for this
case,

"de-parameterizing"? You mean changing to dynamic sql and leaving yourself
vulnerable to sql injection??

>>but now I'm worried - how can I be sure that my other queries won't
suffer from the same problem? Should I never use parameters
because of this possibility?

An ability using parameters is very powerful , don't afraid using
parameters , just test it carefuly

Sure, except that the content of the database is out of my control -
this particular scenario (where nearly all the records matched a
particular key, but the query was first run against a different key)
could easily arrise in a production environment. More to the point,
I've seen no evidence that I'm getting any performance benefits from
using parameterized queries.
I suspect I will at least add a configuration option to avoid
parameterized queries (relatively straightforward, as I have a layer
of
code that handles query parameters) if I see a problem like this
again.

This is a ridiculous overreaction. Problems due to parameter-sniffing are
too rare to justify eliminating the benefits of using parameters. Talk about
"throwing the baby out with te bath water".

The article showed two, no three, ways to alleviate the problems caused by
parameter sniffing and still use parameters. So what do you do? ignore the
article's advice and "de-parameterize" your query...

--
Microsoft MVP - ASP/ASP.NET
Please reply to the newsgroup. This email account is my spam trap so I
don't check it very often. If you must reply off-line, then remove the
"NO SPAM"

Jan 11 '07 #6

Stephen Howe

a) why is this happening? Is it a bug in my code, or in SQL server?

b) is it worth detecting it and fixing it automatically? I.e, should I
put some code in that notices that a query is running far too slowly,
then runs "DBCC DROPCLEANBUFFERS; DBCC FREEPROCCACHE"? Or will that
cause other problems?

Pardon me but

Why did you not say which version of SQL Server you are running on?
And why not mention the version of MDAC you have installed (since you posted
to microsoft.public.data.ado)?
And why not mention something about the Primary Keys & Indexes of the table
you are querying - I take it you have an index on MyKey?
And have you checked the documentation, whitepapers, MSDN for details on
query performance?
And have you checked the query plan in Query Analyser?

Cheers

Stephen Howe

Jan 11 '07 #7

wizofaus

Bob Barrows [MVP] wrote:

wi******@hotmail.com wrote:
Uri Dimant wrote:
>In the end I ended up de-parameterizing the query just for this
case,

"de-parameterizing"? You mean changing to dynamic sql and leaving yourself
vulnerable to sql injection??

No, because the parameter values are fully under my control - they are
not submitted directly by the user. At any rate, in this case it's
always just a simple integer.

>

>but now I'm worried - how can I be sure that my other queries won't
suffer from the same problem? Should I never use parameters
because of this possibility?

An ability using parameters is very powerful , don't afraid using
parameters , just test it carefuly

Sure, except that the content of the database is out of my control -
this particular scenario (where nearly all the records matched a
particular key, but the query was first run against a different key)
could easily arrise in a production environment. More to the point,
I've seen no evidence that I'm getting any performance benefits from
using parameterized queries.
I suspect I will at least add a configuration option to avoid
parameterized queries (relatively straightforward, as I have a layer
of
code that handles query parameters) if I see a problem like this
again.

This is a ridiculous overreaction. Problems due to parameter-sniffing are
too rare to justify eliminating the benefits of using parameters. Talk about
"throwing the baby out with te bath water".

Well, yes, but as I said, the testing I've done has revealed that my
app definitely is suffering badly from parameter-sniffing problems
(I've come across yet another one since), and that the only reliable
way I've found to solve it is to NOT use parameters, which doesn't seem
to be adversely affecting performance.

>
The article showed two, no three, ways to alleviate the problems caused by
parameter sniffing and still use parameters. So what do you do? ignore the
article's advice and "de-parameterize" your query...

Yes, but they use stored procs. I'm trying to avoid stored procs in
order to keep RDBMS independence (although for the time being, we've no
immediate need to support other databases).

Jan 11 '07 #8

Alex Kuznetsov

wizof...@hotmail.com wrote:

>
Yes, but they use stored procs. I'm trying to avoid stored procs in
order to keep RDBMS independence (although for the time being, we've no
immediate need to support other databases).

I would not do that unless there is a very strong business reason to do
so.
IMO maintaining RDBMS independence is like living in an RV instead of
living in a house - you get less comfort for your money, you pay a high
price for your mobility.
Achieving true RDBMS independence is both complex and expensive.
Details here

http://www.devx.com/dbzone/Article/32852

Under most circumstances I would settle down and use SQL Server
proprietary features to get the biggest bang for my buck. Should a need
arise to move - I would move and settle down again.

-----------------------
Alex Kuznetsov
http://sqlserver-tips.blogspot.com/
http://sqlserver-puzzles.blogspot.com/

Jan 11 '07 #9

Roy Harvey

On 11 Jan 2007 08:35:45 -0800, "Alex Kuznetsov"
<AK************@hotmail.COMwrote:

>IMO maintaining RDBMS independence is like living in an RV instead of
living in a house - you get less comfort for your money, you pay a high
price for your mobility.

I like it!

Roy Harvey
Beacon Falls, CT

Jan 11 '07 #10

Gert-Jan Strik

IMO maintaining RDBMS independence is like living in an RV instead of

living in a house - you get less comfort for your money, you pay a high
price for your mobility.
Achieving true RDBMS independence is both complex and expensive.

Hahaha, now that is picture that will never make it in any of Joe
Celko's books :-)

Gert-Jan

Jan 11 '07 #11

swaroop.atre

be EXTREMELY wary when using DBCC DROPCLEANBUFFERS and DBCC
FREEPROCCACHE on a production machine. They clear all cached SPs,
queries and plans. The instance is bound to run under extreme stress
for a considerable amount of time.

wizof...@hotmail.com wrote:

I previously posted about a problem where it seemed that changing the
case of the word "BY" in a SELECT query was causing it to run much much
faster.

Now I've hit the same thing again, where basically almost any change I
make to how the query is executed (so that it still performs the same
function) causes the performance to jump from a dismal 7 or 8 seconds
to instantaneous. It's a very simple query of the form:

SELECT Min(MyValue) FROM MyTable WHERE MyKey = @0

which was running fine until a moment ago, when it suddently started
running hopelessly slowly. If change anything in the query to
lowercase (or the Min to uppercase), it runs fine again. Last time
someone suggested something about a bad plan being cached, and after a
bit of research I found the commands DBCC DROPCLEANBUFFERS and DBCC
FREEPROCCACHE. Sure enough, after running these, the query started
running fine again. The question is

a) why is this happening? Is it a bug in my code, or in SQL server?
b) is it worth detecting it and fixing it automatically? I.e, should I
put some code in that notices that a query is running far too slowly,
then runs "DBCC DROPCLEANBUFFERS; DBCC FREEPROCCACHE"? Or will that
cause other problems?

Thanks

Jan 11 '07 #12

wizofaus

swaroop.a...@gmail.com wrote:

be EXTREMELY wary when using DBCC DROPCLEANBUFFERS and DBCC
FREEPROCCACHE on a production machine. They clear all cached SPs,
queries and plans. The instance is bound to run under extreme stress
for a considerable amount of time.

The instance? You mean it affects all databases?
In this case, I determined I'd have to do it before every single query
call, so obviously that's not practical.

Jan 11 '07 #13

Erland Sommarskog

(wi******@hotmail.com) writes:

Sure, except that the content of the database is out of my control -
this particular scenario (where nearly all the records matched a
particular key, but the query was first run against a different key)
could easily arrise in a production environment. More to the point,
I've seen no evidence that I'm getting any performance benefits from
using parameterized queries.
I suspect I will at least add a configuration option to avoid
parameterized queries (relatively straightforward, as I have a layer of
code that handles query parameters) if I see a problem like this again.

Which version of SQL Server are you using? Here is a test that you can
try to see that you can do to actually test the benefit of
parameterised queries. First create this database:

CREATE DATABASE many_sps
go
USE many_sps
go
DECLARE @sql nvarchar(4000),
@x int
SELECT @x = 1000
WHILE @x 0
BEGIN
SELECT @sql = 'CREATE PROCEDURE abc_' + ltrim(str(@x)) +
'_sp @orderid int AS
SELECT O.OrderID, O.OrderDate, O.CustomerID, C.CompanyName,
Prodcnt = OD.cnt, Totalsum = OD.total
FROM Northwind..Orders O
JOIN Northwind..Customers C ON O.CustomerID = C.CustomerID
JOIN (SELECT OrderID, cnt = COUNT(*), total = SUM(Quantity * UnitPrice)
FROM Northwind..[Order Details]
GROUP BY OrderID) AS OD ON OD.OrderID = O.OrderID
WHERE O.OrderID = @orderid'
EXEC(@sql)
SELECT @x = @x - 1
END

(Don't worry if you don't have Northwind on your server, you are not going
to run these procedures.)

Then use F7 to get the Summary page, and navigate to the Stored Procedures
node for many_sps. Select some 200 procedures, right-click and select
Script As Create To New Query Window. Go for a cup of coffee - this will
take some time depending on your hardware.

When the operation has completed (or you have gotten tired of waiting
and killed SSMS), issue this command:

ALTER DATABASE db SET PARAMETERIZATION FORCED

Redo the scripting operation. It will now complete in five seconds.

The reason for this is that SQL Server Management Studio does not use
parameterised queries. For every procedure it scripts, Mgmt Studio
issues around five queries. All these queries makes it to the
cache that explodes, and all these queries are compiled.

When you set a database to forced parameterisation, SQL Server will
auto-parameterise all statements (with some exceptions documented in
Books Online); normally it only auto-parameterise very simple queries.
In the case of Mgmt Studio it's reallly a go-faster switch.

So dismissing caching of parameterised queries can be a serious mistake.
But it is certainly true that there are situations where parameter
sniffing can be a problem. If it is possible for you to tell in the
appliocation "this is an odd value that needs a special plan", then
you can modify the query text by adding a redudant condition like
"AND 1 = 1". Actually as you have found, changing "BY" to "by" or even
adding extra spaces help. This is because the lookup in the cache
is done on a hash without first collapsing spaces or parsing the
query text.

--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pro...ads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinf...ons/books.mspx

Jan 11 '07 #14

Erland Sommarskog

Erland Sommarskog (es****@sommarskog.se) writes:

When the operation has completed (or you have gotten tired of waiting
and killed SSMS), issue this command:

ALTER DATABASE db SET PARAMETERIZATION FORCED

Redo the scripting operation. It will now complete in five seconds.

By the way, this is something important for your application as well.
Say that a DBA finds out that your app is thrashing the cache by not
using parameterised queries, and sets the database to forced
parameterisation, you will get back the behaviour you have now.

A bettery remedy is to add OPTION (RECOMPILE) at the end of sensitive
queries. This forces a statement recompile, and the query will not be
put in cache. This means that you can still use parameterised queries
and get the other benefits of it. (Protection for SQL injection and
repsecting the user's regional settings.) You also avoid thrashing
the cache.
--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pro...ads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinf...ons/books.mspx

Jan 11 '07 #15

wizofaus

Erland Sommarskog wrote:

When you set a database to forced parameterisation, SQL Server will
auto-parameterise all statements (with some exceptions documented in
Books Online); normally it only auto-parameterise very simple queries.
In the case of Mgmt Studio it's reallly a go-faster switch.

So dismissing caching of parameterised queries can be a serious mistake.
But it is certainly true that there are situations where parameter
sniffing can be a problem. If it is possible for you to tell in the
appliocation "this is an odd value that needs a special plan", then
you can modify the query text by adding a redudant condition like
"AND 1 = 1". Actually as you have found, changing "BY" to "by" or even
adding extra spaces help. This is because the lookup in the cache
is done on a hash without first collapsing spaces or parsing the
query text.

Thanks...one of the most helpful replies I've had on usenet for some
time now!

The problem is that it's pretty hard for me to know that a value is
"odd". In this case, like I said the query in this case is

SELECT Min(MyValue) FROM MyTable WHERE MyKey = @0

Where @0 is actually a value from a list that is obtained from
elsewhere (not the database). It loops through this list, calling the
same query for each one.

Now, I suppose I could first do

SELECT Count(*) FROM MyTable WHERE MyKey = @0

and determine if the number was very low, and if so, de-parameterize it
or add a space or whatever, but then this second query would
potentially suffer from the same problem.

I suppose another alternative is to build another query first

SELECT MyKey, Min(MyValue) FROM MyTable GROUP BY MyKey

then use this to obtain the minimum value for each key, but there's
only so much time I can spend rewriting queries to side-step quirky
performance issues (the application has many many ad-hoc queries,
nearly all parameterized on the same key, so they are all prone to the
same problem).

BTW, this is under SQL server 2000. I've yet to determine if this
particular problem does actually exist under 2005, which is what we're
using for the production environment. Will definitely do that before I
waste too much more time on it.

Jan 12 '07 #16

Erland Sommarskog

(wi******@hotmail.com) writes:

Thanks...one of the most helpful replies I've had on usenet for some
time now!

The problem is that it's pretty hard for me to know that a value is
"odd".

I can understand that this is not always simple. I didn't say this, in
hope it would be. :-)

However, I think I have a cure for you:

In this case, like I said the query in this case is

SELECT Min(MyValue) FROM MyTable WHERE MyKey = @0

Where @0 is actually a value from a list that is obtained from
elsewhere (not the database). It loops through this list, calling the
same query for each one.

Stop! Don't do that! The problems with query plans aside, this is an
ineffecient use of SQL Server. Get all data at once with:

SELECT t.MyKey, Min(t.MyValue)
FROM MyTable t
JOIN list_to_table(@list) f ON t.MyKey = f.value
GROUP BY t.MyKey

Where list_to_table is a table-valued function that transform the list
to a table. I have a whole bunch of such functions on
http://www.sommarskog.se/arrays-in-sql.html.
--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pro...ads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinf...ons/books.mspx

Jan 12 '07 #17

Cor Ligthert [MVP]

Stephen,

You forget in the list, the version of the framework that is used. It is to
AdoNet as well you know.

:-)
Cor

"Stephen Howe" <stephenPOINThoweATtns-globalPOINTcomschreef in bericht
news:%2*****************@TK2MSFTNGP04.phx.gbl...

>a) why is this happening? Is it a bug in my code, or in SQL server?
b) is it worth detecting it and fixing it automatically? I.e, should I
put some code in that notices that a query is running far too slowly,
then runs "DBCC DROPCLEANBUFFERS; DBCC FREEPROCCACHE"? Or will that
cause other problems?

Pardon me but

Why did you not say which version of SQL Server you are running on?
And why not mention the version of MDAC you have installed (since you
posted
to microsoft.public.data.ado)?
And why not mention something about the Primary Keys & Indexes of the
table
you are querying - I take it you have an index on MyKey?
And have you checked the documentation, whitepapers, MSDN for details on
query performance?
And have you checked the query plan in Query Analyser?

Cheers

Stephen Howe

Jan 13 '07 #18

wizofaus

Erland Sommarskog wrote:

(wi******@hotmail.com) writes:
Thanks...one of the most helpful replies I've had on usenet for some
time now!

The problem is that it's pretty hard for me to know that a value is
"odd".

I can understand that this is not always simple. I didn't say this, in
hope it would be. :-)

However, I think I have a cure for you:

In this case, like I said the query in this case is

SELECT Min(MyValue) FROM MyTable WHERE MyKey = @0

Where @0 is actually a value from a list that is obtained from
elsewhere (not the database). It loops through this list, calling the
same query for each one.

Stop! Don't do that! The problems with query plans aside, this is an
ineffecient use of SQL Server. Get all data at once with:

SELECT t.MyKey, Min(t.MyValue)
FROM MyTable t
JOIN list_to_table(@list) f ON t.MyKey = f.value
GROUP BY t.MyKey

Where list_to_table is a table-valued function that transform the list
to a table. I have a whole bunch of such functions on
http://www.sommarskog.se/arrays-in-sql.html.

I agree this sort of thing would be preferable, and more efficient, but
as it is, providing I don't get the bad plan problem, it's efficient
enough as it is, and I'm wary of making too many big changes at this
point. That sort of thing I'd prefer to leave for the next version.

The thing is, I just tried exporting all the records to a CSV file,
creating a clean database with the same tables/indices, re-importing
all the records, and the same problem DOESN'T happen: I can run that
query with the "odd" value first, then with the common value, and both
queries run fast. So something else has happened to my database that
surely I must be able to reset somehow?

Jan 13 '07 #19

Erland Sommarskog

(wi******@hotmail.com) writes:

I agree this sort of thing would be preferable, and more efficient, but
as it is, providing I don't get the bad plan problem, it's efficient
enough as it is, and I'm wary of making too many big changes at this
point. That sort of thing I'd prefer to leave for the next version.

But keep in mind that the solution you have now will not scale well. If
the data in production is ten times larger than you have expected, you
will get ten times more execution time, even with the good plan.

The thing is, I just tried exporting all the records to a CSV file,
creating a clean database with the same tables/indices, re-importing
all the records, and the same problem DOESN'T happen: I can run that
query with the "odd" value first, then with the common value, and both
queries run fast. So something else has happened to my database that
surely I must be able to reset somehow?

I eavesdropped a discussion at PASS in Seattle last year, when a guy
had done extensive tests, and he could repeat a scenario that depending
on which order he loaded the same data, he would get different plans,
good or bad. I presume that part of the answer lies what exactly is in
the statistics. Normally, statistics are only samples, and if the
statistics does not well reflect the data distribution, your plans
will not always be the best.

--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pro...ads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinf...ons/books.mspx

Jan 13 '07 #20

wizofaus

Erland Sommarskog wrote:

(wi******@hotmail.com) writes:
I agree this sort of thing would be preferable, and more efficient, but
as it is, providing I don't get the bad plan problem, it's efficient
enough as it is, and I'm wary of making too many big changes at this
point. That sort of thing I'd prefer to leave for the next version.

But keep in mind that the solution you have now will not scale well. If
the data in production is ten times larger than you have expected, you
will get ten times more execution time, even with the good plan.

Sure. I definitely plan on doing some query optimization and
consolidation for the next version. Your routines may well come in
handy, so thanks.

>
The thing is, I just tried exporting all the records to a CSV file,
creating a clean database with the same tables/indices, re-importing
all the records, and the same problem DOESN'T happen: I can run that
query with the "odd" value first, then with the common value, and both
queries run fast. So something else has happened to my database that
surely I must be able to reset somehow?

I eavesdropped a discussion at PASS in Seattle last year, when a guy
had done extensive tests, and he could repeat a scenario that depending
on which order he loaded the same data, he would get different plans,
good or bad. I presume that part of the answer lies what exactly is in
the statistics. Normally, statistics are only samples, and if the
statistics does not well reflect the data distribution, your plans
will not always be the best.

Well I found another solution - reindex the table.

I ran

dbcc dbreindex('mytable', ' ', 90) on the initial database, and now the
problem is gone away. My main concern was that if we did see this
problem occuring in productoion databases, how could we fix it, other
than changing the code, and at least now I have such a solution, and
it's a bit less drastic than exporting and re-importing all the data
(which potentially could have taken hours).

I'm not sure whether 90 is the best parameter value here, that was just
from the MSDN article.

Jan 14 '07 #21

Erland Sommarskog

(wi******@hotmail.com) writes:

Well I found another solution - reindex the table.

I ran

dbcc dbreindex('mytable', ' ', 90) on the initial database, and now the
problem is gone away. My main concern was that if we did see this
problem occuring in productoion databases, how could we fix it, other
than changing the code, and at least now I have such a solution, and
it's a bit less drastic than exporting and re-importing all the data
(which potentially could have taken hours).

It's recommended to run a maintenance job to reindex the table with some
frequency. The main reason for this is to prevent defragmentation. A side
effect of this is that statistics are updated with fullscan, that is all
rows are considered. That gives the optimizer more accurate information.
--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pro...ads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinf...ons/books.mspx

Jan 14 '07 #22

JXStern

On 13 Jan 2007 18:15:11 -0800, wi******@hotmail.com wrote:

>Well I found another solution - reindex the table.

I ran

dbcc dbreindex('mytable', ' ', 90) on the initial database, and now the
problem is gone away. My main concern was that if we did see this
problem occuring in productoion databases, how could we fix it, other
than changing the code, and at least now I have such a solution, and
it's a bit less drastic than exporting and re-importing all the data
(which potentially could have taken hours).

How large is your table?

Maybe a simple "update statistics" would also fix things?

J.

Jan 15 '07 #23

wizofaus

JXStern wrote:

On 13 Jan 2007 18:15:11 -0800, wi******@hotmail.com wrote:

Well I found another solution - reindex the table.

I ran

dbcc dbreindex('mytable', ' ', 90) on the initial database, and now the
problem is gone away. My main concern was that if we did see this
problem occuring in productoion databases, how could we fix it, other
than changing the code, and at least now I have such a solution, and
it's a bit less drastic than exporting and re-importing all the data
(which potentially could have taken hours).

How large is your table?

'bout 2 million records, 9 columns.

>
Maybe a simple "update statistics" would also fix things?

Quite possibly - unfortunately I can't re-create the problem now to
test it!
Will be the first thing I try if I see the same problem again, though.

Jan 15 '07 #24

JXStern

On 15 Jan 2007 13:36:21 -0800, wi******@hotmail.com wrote:

>How large is your table?

'bout 2 million records, 9 columns.
>>
Maybe a simple "update statistics" would also fix things?

Quite possibly - unfortunately I can't re-create the problem now to
test it!
Will be the first thing I try if I see the same problem again, though.

Could probably reproduce it - insert a 1,000,000 rows = 1, then a
handfull of rows numbered 2-10. Query it for =1, it will scan, then
query it for =2. Or something like that.

J.

Jan 15 '07 #25

wizofaus

JXStern wrote:

On 15 Jan 2007 13:36:21 -0800, wi******@hotmail.com wrote:

How large is your table?
'bout 2 million records, 9 columns.
>
Maybe a simple "update statistics" would also fix things?

Quite possibly - unfortunately I can't re-create the problem now to
test it!
Will be the first thing I try if I see the same problem again, though.

Could probably reproduce it - insert a 1,000,000 rows = 1, then a
handfull of rows numbered 2-10. Query it for =1, it will scan, then
query it for =2. Or something like that.

Not sure, but now I have another issue which does seem to point towards
parameters being a problem.

If I run a query with quite a large number of parameters which
basically

SELECT Count(*) FROM MyTable WHERE Key1 IN (@0, @1, @2...@12) AND Key2
IN (@13, @14, @15...@20)

it takes over 2 seconds, but when I substitute the last 8 parameters
(13-20) with their values, i.e.

SELECT Count(*) FROM MyTable WHERE Key1 IN (@0, @1, @2...@12) AND Key2
IN ('value1', 'value2', 'value3', 'value4'...'value8')

the query takes 15 milliseconds! (actually it took slightly longer -
about 400msec - on the first run - the previous query always takes ~2
seconds no matter how often I run it).

Curiously, the fewer parameters I substitute for the Key2 filter, the
slower it runs (I can't improve on the 15 msec by doing the same for
they Key1 filter), in a more or less linear relationship.

I've tried all the previously mentioned "tricks", including reindexing
the table, but no luck.

So it seems maybe SQL server has problems optimizing if there are too
many parameters.
FWIW, I did try putting the parameter values in a temporary table and
even using a join, but the total time is significantly longer than
15ms.

Again, the parameter values are under my control, so there's no risk of
SQL injection, so if there's another good reason NOT to do my own
parameter substitution in this case, I'd be interested in hearing it.

Jan 18 '07 #26

Bob Barrows [MVP]

wi******@hotmail.com wrote:

Not sure, but now I have another issue which does seem to point
towards parameters being a problem.

If I run a query with quite a large number of parameters which
basically

SELECT Count(*) FROM MyTable WHERE Key1 IN (@0, @1, @2...@12) AND Key2
IN (@13, @14, @15...@20)

it takes over 2 seconds, but when I substitute the last 8 parameters
(13-20) with their values, i.e.

SELECT Count(*) FROM MyTable WHERE Key1 IN (@0, @1, @2...@12) AND Key2
IN ('value1', 'value2', 'value3', 'value4'...'value8')

the query takes 15 milliseconds! (actually it took slightly longer -
about 400msec - on the first run - the previous query always takes ~2
seconds no matter how often I run it).

Now you've got me really intrigued. Based on everything I've ever read, IN
comparisons are supposed to be non-sargable, and therefore non-optimizable.
Someone from the SQL Server groups please correct me if I am wrong. Oh wait,
maybe you have a sufficient number of values to cause the query engine to
use a temp table and join, in which case optimization can occur.

I suggest you use SQL Profiler to determne the difference in the execution
plans for each method. If the fast query plan involves an index that is not
used in the slow query plan, you can use an index hint to force the query
engine to use that index
(http://www.sql-server-performance.co..._sql_where.asp)

>
Curiously, the fewer parameters I substitute for the Key2 filter, the
slower it runs (I can't improve on the 15 msec by doing the same for
they Key1 filter), in a more or less linear relationship.

I've tried all the previously mentioned "tricks", including reindexing
the table, but no luck.

What about the trick that involves using local variables in your batch,
instead of directly using the parameters:

declare @tmp1, @tmp2, etc.
set @tmp1=@parm1
etc.
SELECT ...(@tmp1, ...)

Personally, I would do this in a stored procedure, but you have ignored
previous advice to use stored procedures.

>
So it seems maybe SQL server has problems optimizing if there are too
many parameters.
FWIW, I did try putting the parameter values in a temporary table and
even using a join, but the total time is significantly longer than
15ms.

ISTR reading somewhere that with a sufficient number of values, that the
query engine does this anyways behind the scenes.

>
Again, the parameter values are under my control, so there's no risk
of SQL injection,

I always cringe when I hear somebody say this. Unless those values are
hard-coded into your application code, you have to be getting those values
from somewhere. If a user was involved at any point in the process that
generates those values, then you need to at least entertain the possibility
that some funny business may have occurred. Look in these articles for
"secondary sql injection". It is always a mistake to assume that your user
base is too ignorant to take advantage of these techniques:
http://mvp.unixwiz.net/techtips/sql-injection.html
http://www.sqlsecurity.com/DesktopDefault.aspx?tabid=23
http://www.nextgenss.com/papers/adva..._injection.pdf
http://www.nextgenss.com/papers/more..._injection.pdf

The bottom line may turn out to be that you need to choose between secure
and fast.

--
Microsoft MVP - ASP/ASP.NET
Please reply to the newsgroup. This email account is my spam trap so I
don't check it very often. If you must reply off-line, then remove the
"NO SPAM"

Jan 18 '07 #27

Gert-Jan Strik

wi******@hotmail.com wrote:

>
JXStern wrote:
On 15 Jan 2007 13:36:21 -0800, wi******@hotmail.com wrote:

>How large is your table?
>
>'bout 2 million records, 9 columns.
>>
>Maybe a simple "update statistics" would also fix things?
>>
>Quite possibly - unfortunately I can't re-create the problem now to
>test it!
>Will be the first thing I try if I see the same problem again, though.
Could probably reproduce it - insert a 1,000,000 rows = 1, then a
handfull of rows numbered 2-10. Query it for =1, it will scan, then
query it for =2. Or something like that.
Not sure, but now I have another issue which does seem to point towards
parameters being a problem.

If I run a query with quite a large number of parameters which
basically

SELECT Count(*) FROM MyTable WHERE Key1 IN (@0, @1, @2...@12) AND Key2
IN (@13, @14, @15...@20)

it takes over 2 seconds, but when I substitute the last 8 parameters
(13-20) with their values, i.e.

SELECT Count(*) FROM MyTable WHERE Key1 IN (@0, @1, @2...@12) AND Key2
IN ('value1', 'value2', 'value3', 'value4'...'value8')

the query takes 15 milliseconds! (actually it took slightly longer -
about 400msec - on the first run - the previous query always takes ~2
seconds no matter how often I run it).

Indeed, that is because with literals will really compile the statement
based on the actual values. The optimizer will build a kind of binary
tree. It will also remove any duplicates (when applicable). So the
execution phase will be very fast.

On the other hand, the compilation phase is relatively expensive. And if
you have hundreds of values, the optimizer start to choke.

Curiously, the fewer parameters I substitute for the Key2 filter, the
slower it runs (I can't improve on the 15 msec by doing the same for
they Key1 filter), in a more or less linear relationship.

Explanation: see above

I've tried all the previously mentioned "tricks", including reindexing
the table, but no luck.

So it seems maybe SQL server has problems optimizing if there are too
many parameters.
FWIW, I did try putting the parameter values in a temporary table and
even using a join, but the total time is significantly longer than
15ms.

It is hard to optimizer a scenario like yours. You might try something
like this:

SELECT SUM(cnt)
FROM (
SELECT COUNT(*) AS cnt FROM MyTable WHERE Key1=@0
UNION ALL
SELECT COUNT(*) AS cnt FROM MyTable WHERE Key1=@1
UNION ALL
...
) AS T

Although the optimizer might automatically come up with a query plan
that reflects this strategy, I doubt that it actually will.

Please let me know if it actually increases your query performance.

HTH,
Gert-Jan

Jan 18 '07 #28

wizofaus

Bob Barrows [MVP] wrote:

wi******@hotmail.com wrote:
Not sure, but now I have another issue which does seem to point
towards parameters being a problem.

If I run a query with quite a large number of parameters which
basically

SELECT Count(*) FROM MyTable WHERE Key1 IN (@0, @1, @2...@12) AND Key2
IN (@13, @14, @15...@20)

it takes over 2 seconds, but when I substitute the last 8 parameters
(13-20) with their values, i.e.

SELECT Count(*) FROM MyTable WHERE Key1 IN (@0, @1, @2...@12) AND Key2
IN ('value1', 'value2', 'value3', 'value4'...'value8')

the query takes 15 milliseconds! (actually it took slightly longer -
about 400msec - on the first run - the previous query always takes ~2
seconds no matter how often I run it).

Now you've got me really intrigued. Based on everything I've ever read, IN
comparisons are supposed to be non-sargable, and therefore non-optimizable.
Someone from the SQL Server groups please correct me if I am wrong. Oh wait,
maybe you have a sufficient number of values to cause the query engine to
use a temp table and join, in which case optimization can occur.

But that's the thing - I tried doing that explicitly myself, and it's
considerably slower.

>
I suggest you use SQL Profiler to determne the difference in the execution
plans for each method. If the fast query plan involves an index that is not
used in the slow query plan, you can use an index hint to force the query
engine to use that index
(http://www.sql-server-performance.co..._sql_where.asp)

Curiously, the fewer parameters I substitute for the Key2 filter, the
slower it runs (I can't improve on the 15 msec by doing the same for
they Key1 filter), in a more or less linear relationship.

I've tried all the previously mentioned "tricks", including reindexing
the table, but no luck.

What about the trick that involves using local variables in your batch,
instead of directly using the parameters:

declare @tmp1, @tmp2, etc.
set @tmp1=@parm1
etc.
SELECT ...(@tmp1, ...)

Personally, I would do this in a stored procedure, but you have ignored
previous advice to use stored procedures.

Actually I tried stored procs as well, and it didn't seem to be
helping. Also, can you write a stored proc to take a variable number
of parameters?

>

So it seems maybe SQL server has problems optimizing if there are too
many parameters.
FWIW, I did try putting the parameter values in a temporary table and
even using a join, but the total time is significantly longer than
15ms.

ISTR reading somewhere that with a sufficient number of values, that the
query engine does this anyways behind the scenes.

Again, the parameter values are under my control, so there's no risk
of SQL injection,

I always cringe when I hear somebody say this. Unless those values are
hard-coded into your application code, you have to be getting those values
from somewhere. If a user was involved at any point in the process that
generates those values, then you need to at least entertain the possibility
that some funny business may have occurred.

Perhaps, but in this case they are auto-generated - the user has no
control over what the actual values are (only how many there are).

At any rate, if the only thing to take into consideration here is
performance vs security, then I'm afraid performance does win. No-one
will use the application if it's a dog. And as it happens, the
database doesn't hold particularly sensitive (or irreplaceable) data
anyway.

Jan 18 '07 #29

Erland Sommarskog

Bob Barrows [MVP] (re******@NOyahoo.SPAMcom) writes:

Now you've got me really intrigued. Based on everything I've ever read,
IN comparisons are supposed to be non-sargable, and therefore
non-optimizable. Someone from the SQL Server groups please correct me if
I am wrong.

coi IN (val1, val2, ...)

is just a shortcut for

col1 = val1 OR col2 = val2 OR ...

and it's perfectly possible for the optimizer to work with IN to produce a
good plan. The main problem is that for many values, the optimization time
can exceed the real execution time by far.
--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pro...ads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinf...ons/books.mspx

Jan 18 '07 #30

Similar topics