Optimizer picks an ineffient plan - Page 2

Bupp Phillips

I have a customer table that has the field CUSTOMER_ID as the primary key
(cust_pkkey), the table has 102,834 records in it.
The following select statement works fine:

select * from customer order by customer_id;
QUERY PLAN:
Index Scan using cust_pkkey on customer (cost=0.00..517 5.17 rows=102834
width=724)
Total runtime: 5999.47 msec

but...

select * from customer order by customer_id, first_name;
QUERY PLAN:
Sort(cost=14202 8.25..142028.25 rows=102834 width=724)
-> Seq Scan on customer (cost=0.00..461 7.34 rows=102834 width=724)
Total runtime: 19999.81 msec
It seems that the optimizer should be able to detect (in this instance at
least) that the first order by field is a primary key and should not
consider the other fields because it's pointless... the resultset will be in
<primary key> order.

NOTE: I'm testing this on Postgresql 7.2 for Windows, so this my have
already been dealt with.
Thanks and keep up the great work!!

Nov 11 '05

Subscribe Reply

1979

Greg Stark

Stephan Szabo <ss****@megazon e.bigpanda.com> writes:

On Wed, 3 Sep 2003, Bupp Phillips wrote:
Well, it's unfortunate that you feel that way, because SQL Server handles it
correctly.

For some definition of correctly. If you're in a system which gets
penalized .001 seconds for each query planning that uses a multi-column
order by and you do 100 million of them that this doesn't apply to, and
one that it does which save you 30 seconds, is that correct?

Uhm. Yes. Absolutely.

For OLTP systems like a web site if there's a single query anywhere in the
system that takes 30s the system is broken. Every time that query happens to
be hit a few times in a row the OLTP system will simply break. This isn't a
performance issue, it's outright broken.

Whereas a 1ms performance hit on every query plan will make virtually no
difference at all in performance and most importantly, it will work. At worst
it means provisioning a 1% faster machine. Even then only if my system isn't
preparing all these queries in advance, in which case I have bigger
performance and security issues than a 1ms per query hit.

For DSS systems handling queries that take minutes or hours to run the case is
even clearer. The 1ms hit is even less of an issue, and the 30s gain will
balloon and turn into minutes or hours of speedup.

I'm pretty sure this particular case was not one of the cases where people
said it just wasn't worth doing. This was just too hard. Solving it in a way
that integrates cleanly with postgres's aggregates will be very hard and
require someone who understands lots of different parts of postgres to spend
an awful lot of time thinking about the problem.

[This is one of the strength's of free software. There's no marketing
department providing checklists that have to be met even if there's no good
solution today. So instead of a bad solution that sticks around for a long
time, we'll one day (hopefully) have a good solution when someone figures out
how to do it.]

--
greg
---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddres sHere" to ma*******@postg resql.org)

Nov 11 '05 #11

Stephan Szabo

On 5 Sep 2003, Greg Stark wrote:

Stephan Szabo <ss****@megazon e.bigpanda.com> writes:
On Wed, 3 Sep 2003, Bupp Phillips wrote:
Well, it's unfortunate that you feel that way, because SQL Server handles it
correctly.
For some definition of correctly. If you're in a system which gets
penalized .001 seconds for each query planning that uses a multi-column
order by and you do 100 million of them that this doesn't apply to, and
one that it does which save you 30 seconds, is that correct?

Uhm. Yes. Absolutely.

Even if all the people that don't write queries with redundant sort
columns pay the cost for those that do? In some ways it'd be nice if we
could control the planner more so that individual systems could make
determinations of whether this was useful or not.

From the below, I'm not sure we're talking about the same case any longer.
:)
I'm pretty sure this particular case was not one of the cases where people
said it just wasn't worth doing. This was just too hard. Solving it in a way
that integrates cleanly with postgres's aggregates will be very hard and
require someone who understands lots of different parts of postgres to spend
an awful lot of time thinking about the problem.

I'm not sure how finding redundant sort columns due to unique constraints
really integrates with aggregates at all. Did we maybe cross conversation
with one of the aggregate discussions on min/max/count?
---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Nov 11 '05 #12

Stephan Szabo

On Thu, 4 Sep 2003, Relaxin wrote:

What the heck are you talking about?
Whether or not trying every optimization is worthwhile in every potential
situation where it might apply, in theory.
SQLServer returns the query immediately (milliseconds) where as PGSQL
returns 6 seconds!!
So, yes I would say that SQLServer did it correctly in this situation. The
optimizer did what it was suppose to do...find the quickest way to get me
the data.
*If* you had other queries that took 6 seconds rather than milliseconds
because it was trying to work out if the optimization applied, would you
still think it was correct to be trying the optimization on those queries?
What about if it added 50 ms to every query you did with an order by, and
you never used redundant sort columns?
Now if you guys here a PGSQL don't believe it's necessary to change the
optimizer for this purpose, that's fine, but don't start putting other DBMS
down because they DO handle this situation.
Where did I put down another DBMS? (I've left my text below). I said (not
in so many words), some optimizations take time, always doing every
available optimization may not be "correct" because it may affect the
running time of other queries adversely.

I don't really know if this particular case warrents doing, I don't know
if the time involved in checking it is entirely negligible or not for
complicated queries. I do know that there are people who use order by
however and do not use redundant sort columns and that if there is a
non-negligible cost, they're the ones that are really going to be paying
for it. Tom believed that the cost was non-negligible for its benefit, if
someone else wants to do it and get numbers and the costs are negligible,
it's likely to get put in.
"Stephan Szabo" <ss****@megazon e.bigpanda.com> wrote in message
news:20******** *************** ***@megazone.bi gpanda.com...
On Wed, 3 Sep 2003, Bupp Phillips wrote:
Well, it's unfortunate that you feel that way, because SQL Server handles it correctly.

For some definition of correctly. If you're in a system which gets
penalized .001 seconds for each query planning that uses a multi-column
order by and you do 100 million of them that this doesn't apply to, and
one that it does which save you 30 seconds, is that correct?

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Nov 11 '05 #13

Relaxin

Regardless of your argument, all I know is that SQL Server handles it
correctly (giving me the data in the fastest possible way) and postgresql
doesn't.
And your argument about how it will increase other queries is pointless to
me. Will you still stand by this argument when the PG folks find a situation
where the optimizer creates a terrible plan and the only way to fix is to
add additional logic, based on what you are saying, it's not worth it
because it could add .001 seconds of additional processing.
Again, like I said, if the PG folks don't see a need for this type of
optimization, then that's fine. What I'm stating is that SQLServer is a very
fast and robust database that also handles this, as you may call it, odd
situation.

Thanks
"Stephan Szabo" <ss****@megazon e.bigpanda.com> wrote in message
news:20******** *************** ***@megazone.bi gpanda.com...

On Thu, 4 Sep 2003, Relaxin wrote:
What the heck are you talking about?

Whether or not trying every optimization is worthwhile in every potential
situation where it might apply, in theory.
SQLServer returns the query immediately (milliseconds) where as PGSQL
returns 6 seconds!!
So, yes I would say that SQLServer did it correctly in this situation. The optimizer did what it was suppose to do...find the quickest way to get me the data.

*If* you had other queries that took 6 seconds rather than milliseconds
because it was trying to work out if the optimization applied, would you
still think it was correct to be trying the optimization on those queries?
What about if it added 50 ms to every query you did with an order by, and
you never used redundant sort columns?
Now if you guys here a PGSQL don't believe it's necessary to change the
optimizer for this purpose, that's fine, but don't start putting other DBMS down because they DO handle this situation.

Where did I put down another DBMS? (I've left my text below). I said (not
in so many words), some optimizations take time, always doing every
available optimization may not be "correct" because it may affect the
running time of other queries adversely.

I don't really know if this particular case warrents doing, I don't know
if the time involved in checking it is entirely negligible or not for
complicated queries. I do know that there are people who use order by
however and do not use redundant sort columns and that if there is a
non-negligible cost, they're the ones that are really going to be paying
for it. Tom believed that the cost was non-negligible for its benefit, if
someone else wants to do it and get numbers and the costs are negligible,
it's likely to get put in.
"Stephan Szabo" <ss****@megazon e.bigpanda.com> wrote in message
news:20******** *************** ***@megazone.bi gpanda.com...
On Wed, 3 Sep 2003, Bupp Phillips wrote:

> Well, it's unfortunate that you feel that way, because SQL Server

handles it
> correctly.

For some definition of correctly. If you're in a system which gets
penalized .001 seconds for each query planning that uses a multi-column order by and you do 100 million of them that this doesn't apply to, and one that it does which save you 30 seconds, is that correct?

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Nov 11 '05 #14

Stephan Szabo

On Fri, 5 Sep 2003, Relaxin wrote:

And your argument about how it will increase other queries is pointless to
me. Will you still stand by this argument when the PG folks find a situation
where the optimizer creates a terrible plan and the only way to fix is to
add additional logic, based on what you are saying, it's not worth it
because it could add .001 seconds of additional processing.

I've tried to say that you need to balance the cost of the optimization
against its potential gain in the cases it helps, the frequency of those
cases, the ability to solve the problems other ways and against the
development time that is used in it. There isn't a push button with two
states, best and not best.

When you want to argue about an optimization you should think about:
a) What does the optimization actually mean (actual specification, not
vague words)
b) What cases is the optimization legal for (doesn't change results,
doesn't violate spec, etc)
c) What cases does the optimization help (as separate from b in that some
cases may not actually be faster but the optimization is legal)
d) What is the gain over the cases that the optimization helps
e) What is the penalty over the cases that the optimization does not help

If someone feels strongly about it or feels that they have the time,
they can (and should be generally promoted to) attempt the optimization.
If the optimization is not expensive or has better than expected gains or
good side effects, it's likely to get accepted.
---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Nov 11 '05 #15

Similar topics

4752

Query Optimizer does not ignore constant condition?

by: hemal | last post by:

I came across a very strange situation at work. There is an order of magnitude difference in execution time for the following two queries (10 v/s ~130 msec): select count(*) from table_name where column_name = 'value' or 1 = 0 select count(*) from table_name where column_name = 'value' I do not want to go into the reason why the redundent condition exists, the example is representative of the real query where it

Oracle Database

2355

pgsql optimizer

by: gefek | last post by:

Hello, I've got a server postgresql 7.4.6 installed from RPM for linux RH 9. Lately, I've noticed, that some queries last too long... It appears to me, that the optimizer does not use index optimizing.... This is what analyze shows: ------ explain analyze update activities set act_synch_date='2005-02-03 00:00:00' where activities.act_id=17; QUERY PLAN...

PostgreSQL Database

1728

Optimizer Logic Tracing Tool

by: Daniel Roy | last post by:

Guys, what I need is a tool which gives details on the choice of an execution plan by the SQL Server. For example, the cost for a hash join might be 200 and 100 for a nested loop, and therefore a nested loop is used. Same thing for the access paths for each table/view involved. In Oracle, we turn on event 100053 to see this kind of info. Thanx Daniel

Microsoft SQL Server

1597

Weird Optimizer Choice

by: Philip Yale | last post by:

I'm very puzzled by the choice of NC index being made by the optimizer in this example. I don't actually think it should use an NC index at all. I have: Table: CustomerStatus_T Single data page 19 records Clustered Index on CustomerStatusID:

Microsoft SQL Server

1419

Optimizer isn't perfect

by: Greg Stark | last post by:

Hm, here's a query where the optimizer is choosing the wrong plan by far. I think it boils down to it guessing wrong on how selective an rtree index is, which I guess would be hard to predict. Except if it guesses wrong by assuming it isn't selective it would be maybe 50% slower doing lots of index lookups instead of a more efficient full table scan and join. If it guesses wrong by assuming it'll be very selective as it is in this case...

PostgreSQL Database

2806

DB2 Optimizer is not using Summary Table (DB2 V 8.1.7)

by: Andrea | last post by:

Hi, I've read the former postings but was not able to solve my problem: I have a Summary Table (or MQT as you like it) and the query optimizer does not seem to use the summary table. I run the following commands: set current query optimization 9;

DB2 Database

1240

by: Larry Bertolini | last post by:

While trying to help a developer troubleshoot a performance problem, I have discovered something that strikes me as odd. When I run a particular query, using a non-privileged login that has necessary access to the objects, the query runs in 3.5 hours. When I run the same query, but use a login that has sysadmin role, the query runs in 1 second. I compared the showplan output, and the "regular" user's execution plan

Microsoft SQL Server

3060

MQT table use by optimizer

by: Kevin | last post by:

Using a base table, a MQT table was created. With optimization - when querying the base table with calcuation that are already completed in the MQT - I would assume the optimizer would use the MQT table instead of the base table. What causes the optimizer to use the MQT tables rather than the tables as defined by the query? Is there a way to "encourage" use of the MQT tables? Thanks.

DB2 Database

3907

MS SQL 2005: Where clause and optimizer

by: boa sema | last post by:

Way back when, and at least in version 7 IIRC, the query optimizer gave up when the where clause in a statement contained more than 4 search conditions. Does anyone know if such a limitation still exist in MS SQL 2005? The BOL seems to be silent on the issue. Boa

Microsoft SQL Server

9708

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...

General

9588

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...

Windows Server

10085

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...

General

6857

Couldn’t get equations in html when convert word .docx file to html file in C#.

by: conductexam | last post by:

I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...

C# / C Sharp

5527

Trying to create a lan-to-lan vpn between two differents networks

by: TSSRALBI | last post by:

Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...

Networking - Hardware / Configuration

5663

Windows Forms - .Net 8.0

by: adsilva | last post by:

A Windows Forms form does not have the event Unload, like VB6. What one acts like?

Visual Basic .NET

4302

transfer the data from one system to another through ip address

by: 6302768590 | last post by:

Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system

C# / C Sharp

3828

How to add payments to a PHP MySQL app.

by: muto222 | last post by:

How can i add a mobile payment intergratation into php mysql website.

PHP

2999

Comprehensive Guide to Website Development in Toronto: Expert Insights from BSMN Consultancy

by: bsmnconsultancy | last post by:

In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

General