Optimizer picks an ineffient plan

Bupp Phillips

I have a customer table that has the field CUSTOMER_ID as the primary key
(cust_pkkey), the table has 102,834 records in it.
The following select statement works fine:

select * from customer order by customer_id;
QUERY PLAN:
Index Scan using cust_pkkey on customer (cost=0.00..517 5.17 rows=102834
width=724)
Total runtime: 5999.47 msec

but...

select * from customer order by customer_id, first_name;
QUERY PLAN:
Sort(cost=14202 8.25..142028.25 rows=102834 width=724)
-> Seq Scan on customer (cost=0.00..461 7.34 rows=102834 width=724)
Total runtime: 19999.81 msec
It seems that the optimizer should be able to detect (in this instance at
least) that the first order by field is a primary key and should not
consider the other fields because it's pointless... the resultset will be in
<primary key> order.

NOTE: I'm testing this on Postgresql 7.2 for Windows, so this my have
already been dealt with.
Thanks and keep up the great work!!

Nov 11 '05 #1

Subscribe Reply

1977

Greg Stark

"Bupp Phillips" <he***@noname.c om> writes:

but...

select * from customer order by customer_id, first_name;
QUERY PLAN:
Sort(cost=14202 8.25..142028.25 rows=102834 width=724)
-> Seq Scan on customer (cost=0.00..461 7.34 rows=102834 width=724)
Total runtime: 19999.81 msec

Actually in this case the optimizer would likely still use a sequential scan
even if it had an index it thought it could use. If you're going to be reading
the whole table anyways it'll be faster to read it in order than to jump all
around even if you have to sort it.

However you do have a point. In this case I don't think postgres even
considers using the index. Even if it would decide not to use it in this case
there could conceivably be cases where it would want to use it.

However I'm not sure I see a lot of cases where this would come up. Even in
automatically generated code, which is the usual cause of redundant things
like this, I don't think I've seen this particular combination ever come up.
--
greg
---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Nov 11 '05 #2

Tom Lane

Greg Stark <gs*****@mit.ed u> writes:

"Bupp Phillips" <he***@noname.c om> writes:
select * from customer order by customer_id, first_name;
[ where customer_id is the primary key ]
However you do have a point. In this case I don't think postgres even
considers using the index.
It will not, since the index does not appear to provide the correct sort
order.
However I'm not sure I see a lot of cases where this would come up.

Yes, that's the real crux of the matter. Should the optimizer spend
cycles on *every* query to detect cases where the user has written
useless sort keys? I've got grave doubts that it's a win. ISTM such
an optimization penalizes the folk who write their queries well to
benefit those who are careless.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Nov 11 '05 #3

Bupp Phillips

Well, it's unfortunate that you feel that way, because SQL Server handles it
correctly.
"Tom Lane" <tg*@sss.pgh.pa .us> wrote in message
news:43******** *****@sss.pgh.p a.us...

Greg Stark <gs*****@mit.ed u> writes:
"Bupp Phillips" <he***@noname.c om> writes:
select * from customer order by customer_id, first_name;
[ where customer_id is the primary key ]

However you do have a point. In this case I don't think postgres even
considers using the index.

It will not, since the index does not appear to provide the correct sort
order.
However I'm not sure I see a lot of cases where this would come up.

Yes, that's the real crux of the matter. Should the optimizer spend
cycles on *every* query to detect cases where the user has written
useless sort keys? I've got grave doubts that it's a win. ISTM such
an optimization penalizes the folk who write their queries well to
benefit those who are careless.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Nov 11 '05 #4

Greg Stark

Tom Lane <tg*@sss.pgh.pa .us> writes:

However I'm not sure I see a lot of cases where this would come up.

Yes, that's the real crux of the matter. Should the optimizer spend
cycles on *every* query to detect cases where the user has written
useless sort keys? I've got grave doubts that it's a win. ISTM such
an optimization penalizes the folk who write their queries well to
benefit those who are careless.

Well I'm sure the same arguments were made 30 years ago about optimizing
compilers. But thankfully the optimizer-geeks won the argument. As a result
these days we can more or less write our code in whatever form is most
readable and flexible. We can spend our time worrying about algorithmic
improvements and design abstraction, and not worry that the extra layer of
abstraction will introduce redundant code or inefficient expressions.

Already today we see database-independent toolkits that write SQL queries for
you. They introduce needless subqueries and other inefficiencies willy-nilly.

I argue that if the performance of the database is important down to a
sub-second constant term per query, then you're talking about an OLTP system
where all the queries ought to be prepared and the plans cached. If you're
talking about a system where queries are constructed ad-hoc for every
execution then you should be talking about a DSS system running batch jobs
where an extra few seconds spent optimizing could save you hours.

All that said I'm not sure the case at hand is a great example. I don't think
it would be a big performance loss, but the added code complexity for nothing
might be annoying. I don't see how even automatically generated code is likely
to generate this situation.

--
greg
---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddres sHere" to ma*******@postg resql.org)

Nov 11 '05 #5

Tom Lane

Greg Stark <gs*****@mit.ed u> writes:

Tom Lane <tg*@sss.pgh.pa .us> writes:
Yes, that's the real crux of the matter. Should the optimizer spend
cycles on *every* query to detect cases where the user has written
useless sort keys? I've got grave doubts that it's a win.
Well I'm sure the same arguments were made 30 years ago about optimizing
compilers. But thankfully the optimizer-geeks won the argument.

Um ... I *am* an optimizer-geek. You can find my name in the credits
for Bliss/11, which was the best optimizing compiler on the planet about
thirty years ago. I stand by my comment that there's a tradeoff between
the potential gain from an optimization and the time spent to find it.

PG is at a disadvantage compared to typical compilation scenarios, in
that a compiler assumes its output will be executed many times, while
SQL queries often are planned and then executed but once. There's been
some talk of working harder when planning a "prepared statement", but
so far I've not seen very many places where I'd really want to alter
the planner's behavior on that basis.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Nov 11 '05 #6

Peter Childs

On Thu, 4 Sep 2003, Tom Lane wrote:

Greg Stark <gs*****@mit.ed u> writes:
Tom Lane <tg*@sss.pgh.pa .us> writes:
Yes, that's the real crux of the matter. Should the optimizer spend
cycles on *every* query to detect cases where the user has written
useless sort keys? I've got grave doubts that it's a win.

Well I'm sure the same arguments were made 30 years ago about optimizing
compilers. But thankfully the optimizer-geeks won the argument.

Um ... I *am* an optimizer-geek. You can find my name in the credits
for Bliss/11, which was the best optimizing compiler on the planet about
thirty years ago. I stand by my comment that there's a tradeoff between
the potential gain from an optimization and the time spent to find it.

PG is at a disadvantage compared to typical compilation scenarios, in
that a compiler assumes its output will be executed many times, while
SQL queries often are planned and then executed but once. There's been
some talk of working harder when planning a "prepared statement", but
so far I've not seen very many places where I'd really want to alter
the planner's behavior on that basis.

An intresting point. Perhaps storing some stats on Views would
help. Maybe adding a cache facility for views would speed some things up.
I don't really see anything against storing stats on "Prepared
Statements" and Views like we do on Tables.
Maybe indexs on View would be useful but keeping them uptodate
would be a hazard.

Peter Childs
---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddres sHere" to ma*******@postg resql.org)

Nov 11 '05 #7

Greg Stark

Tom Lane <tg*@sss.pgh.pa .us> writes:

Um ... I *am* an optimizer-geek.
That explains so much :)
I stand by my comment that there's a tradeoff between the potential gain
from an optimization and the time spent to find it.
Well there are always tradoffs in engineering. I'm just trying to push a
little bit in one direction and make you rethink your assumptions.

In the past postgres users were largely OLTP systems (largely web sites)
running ad-hoc queries that are parsed and executed every time and interpolate
parameters into the query string. That's crap. With the new FE binary
protocol, and a bit of a push to the driver writers a good high performance
(and secure) system ought to be able to run entirely prepared queries that are
prepared once per backend process and executed thousands of times.
PG is at a disadvantage compared to typical compilation scenarios, in
that a compiler assumes its output will be executed many times, while
SQL queries often are planned and then executed but once. There's been
some talk of working harder when planning a "prepared statement", but
so far I've not seen very many places where I'd really want to alter
the planner's behavior on that basis.

I think that's backwards actually. The queries where you would want to work
super extra hard, spending a few seconds or even minutes checking possible
plans are the ones that will run for hours. Those are more likely to be DSS
queries that are unprepared ad-hoc queries that will be executed only once.

For OLTP queries I think postgres can afford to not worry about small
(subsecond) constant-time optimizations even if they're unlikely to return big
benefits because OLTP systems should be running entirely prepared queries.

--
greg
---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddres sHere" to ma*******@postg resql.org)

Nov 11 '05 #8

Stephan Szabo

On Wed, 3 Sep 2003, Bupp Phillips wrote:

Well, it's unfortunate that you feel that way, because SQL Server handles it
correctly.
For some definition of correctly. If you're in a system which gets
penalized .001 seconds for each query planning that uses a multi-column
order by and you do 100 million of them that this doesn't apply to, and
one that it does which save you 30 seconds, is that correct?

"Tom Lane" <tg*@sss.pgh.pa .us> wrote in message
news:43******** *****@sss.pgh.p a.us...
Greg Stark <gs*****@mit.ed u> writes:
"Bupp Phillips" <he***@noname.c om> writes:
> select * from customer order by customer_id, first_name;
> [ where customer_id is the primary key ]

However you do have a point. In this case I don't think postgres even
considers using the index.

It will not, since the index does not appear to provide the correct sort
order.
However I'm not sure I see a lot of cases where this would come up.

Yes, that's the real crux of the matter. Should the optimizer spend
cycles on *every* query to detect cases where the user has written
useless sort keys? I've got grave doubts that it's a win. ISTM such
an optimization penalizes the folk who write their queries well to
benefit those who are careless.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Nov 11 '05 #9

Relaxin

What the heck are you talking about?

SQLServer returns the query immediately (milliseconds) where as PGSQL
returns 6 seconds!!
So, yes I would say that SQLServer did it correctly in this situation. The
optimizer did what it was suppose to do...find the quickest way to get me
the data.
Now if you guys here a PGSQL don't believe it's necessary to change the
optimizer for this purpose, that's fine, but don't start putting other DBMS
down because they DO handle this situation.
"Stephan Szabo" <ss****@megazon e.bigpanda.com> wrote in message
news:20******** *************** ***@megazone.bi gpanda.com...

On Wed, 3 Sep 2003, Bupp Phillips wrote:
Well, it's unfortunate that you feel that way, because SQL Server handles it correctly.

For some definition of correctly. If you're in a system which gets
penalized .001 seconds for each query planning that uses a multi-column
order by and you do 100 million of them that this doesn't apply to, and
one that it does which save you 30 seconds, is that correct?

"Tom Lane" <tg*@sss.pgh.pa .us> wrote in message
news:43******** *****@sss.pgh.p a.us...
Greg Stark <gs*****@mit.ed u> writes:
> "Bupp Phillips" <he***@noname.c om> writes:
>> select * from customer order by customer_id, first_name;
>> [ where customer_id is the primary key ]

> However you do have a point. In this case I don't think postgres even > considers using the index.

It will not, since the index does not appear to provide the correct sort order.

> However I'm not sure I see a lot of cases where this would come up.

Yes, that's the real crux of the matter. Should the optimizer spend
cycles on *every* query to detect cases where the user has written
useless sort keys? I've got grave doubts that it's a win. ISTM such
an optimization penalizes the folk who write their queries well to
benefit those who are careless.

regards, tom lane

---------------------------(end of broadcast)--------------------------- TIP 8: explain analyze is your friend

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Nov 11 '05 #10

Similar topics

4750

Query Optimizer does not ignore constant condition?

by: hemal | last post by:

I came across a very strange situation at work. There is an order of magnitude difference in execution time for the following two queries (10 v/s ~130 msec): select count(*) from table_name where column_name = 'value' or 1 = 0 select count(*) from table_name where column_name = 'value' I do not want to go into the reason why the redundent condition exists, the example is representative of the real query where it

Oracle Database

2354

pgsql optimizer

by: gefek | last post by:

Hello, I've got a server postgresql 7.4.6 installed from RPM for linux RH 9. Lately, I've noticed, that some queries last too long... It appears to me, that the optimizer does not use index optimizing.... This is what analyze shows: ------ explain analyze update activities set act_synch_date='2005-02-03 00:00:00' where activities.act_id=17; QUERY PLAN...

PostgreSQL Database

1728

Optimizer Logic Tracing Tool

by: Daniel Roy | last post by:

Guys, what I need is a tool which gives details on the choice of an execution plan by the SQL Server. For example, the cost for a hash join might be 200 and 100 for a nested loop, and therefore a nested loop is used. Same thing for the access paths for each table/view involved. In Oracle, we turn on event 100053 to see this kind of info. Thanx Daniel

Microsoft SQL Server

1596

Weird Optimizer Choice

by: Philip Yale | last post by:

I'm very puzzled by the choice of NC index being made by the optimizer in this example. I don't actually think it should use an NC index at all. I have: Table: CustomerStatus_T Single data page 19 records Clustered Index on CustomerStatusID:

Microsoft SQL Server

1419

Optimizer isn't perfect

by: Greg Stark | last post by:

Hm, here's a query where the optimizer is choosing the wrong plan by far. I think it boils down to it guessing wrong on how selective an rtree index is, which I guess would be hard to predict. Except if it guesses wrong by assuming it isn't selective it would be maybe 50% slower doing lots of index lookups instead of a more efficient full table scan and join. If it guesses wrong by assuming it'll be very selective as it is in this case...

PostgreSQL Database

2806

DB2 Optimizer is not using Summary Table (DB2 V 8.1.7)

by: Andrea | last post by:

Hi, I've read the former postings but was not able to solve my problem: I have a Summary Table (or MQT as you like it) and the query optimizer does not seem to use the summary table. I run the following commands: set current query optimization 9;

DB2 Database

1239

by: Larry Bertolini | last post by:

While trying to help a developer troubleshoot a performance problem, I have discovered something that strikes me as odd. When I run a particular query, using a non-privileged login that has necessary access to the objects, the query runs in 3.5 hours. When I run the same query, but use a login that has sysadmin role, the query runs in 1 second. I compared the showplan output, and the "regular" user's execution plan

Microsoft SQL Server

3057

MQT table use by optimizer

by: Kevin | last post by:

Using a base table, a MQT table was created. With optimization - when querying the base table with calcuation that are already completed in the MQT - I would assume the optimizer would use the MQT table instead of the base table. What causes the optimizer to use the MQT tables rather than the tables as defined by the query? Is there a way to "encourage" use of the MQT tables? Thanks.

DB2 Database

3907

MS SQL 2005: Where clause and optimizer

by: boa sema | last post by:

Way back when, and at least in version 7 IIRC, the query optimizer gave up when the where clause in a statement contained more than 4 search conditions. Does anyone know if such a limitation still exist in MS SQL 2005? The BOL seems to be silent on the issue. Boa

Microsoft SQL Server

9673

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...

General

10443

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...

C / C++

10216

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...

Online Marketing

10002

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...

General

7543

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...

Microsoft Access / VBA

6783

Couldn’t get equations in html when convert word .docx file to html file in C#.

by: conductexam | last post by:

I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...

C# / C Sharp

5565

Windows Forms - .Net 8.0

by: adsilva | last post by:

A Windows Forms form does not have the event Unload, like VB6. What one acts like?

Visual Basic .NET

4113

transfer the data from one system to another through ip address

by: 6302768590 | last post by:

Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system

C# / C Sharp

3728

How to add payments to a PHP MySQL app.

by: muto222 | last post by:

How can i add a mobile payment intergratation into php mysql website.

PHP