Optimizer isn't perfect

Greg Stark

Hm, here's a query where the optimizer is choosing the wrong plan by far. I
think it boils down to it guessing wrong on how selective an rtree index is,
which I guess would be hard to predict.

Except if it guesses wrong by assuming it isn't selective it would be maybe
50% slower doing lots of index lookups instead of a more efficient full table
scan and join. If it guesses wrong by assuming it'll be very selective as it
is in this case then it's 1000% slower doing repeated full table scans.

This is the same thing someone else pointed out a while ago. The consequences
of guessing wrong and favouring a full table scan instead of index lookups
with nested loops are simply much more severe than the reverse.

Even if the full table scan was faster with one dataset I might prefer to use
the slower index lookup and live with the lower performance but be able to
guarantee that a sudden shift in data distribution or user behaviour wouldn't
suddenly cause a complete outage. I'm wondering if it doesn't make sense to go
into production with enable_seqscan = off.
slo=> explain analyze
SELECT (select count(distinct xx_id) from xx_thing where thing_id in (select thing_id from thing where group_id = group.group_id)) as num_xxs
FROM group
WHERE geom2 @ make_box(-79.3885,43.6438,65)
AND earth_dist(geom, -79.3885,43.6438) < 65

;
slo-> slo-> slo-> slo-> slo-> slo->
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Index Scan using idx_group_geom on group (cost=0.00..130.62 rows=1 width=4) (actual time=69.53..7410.31 rows=125 loops=1)
Index Cond: (geom2 @ '(-78.5801566350508,44.2287532037437),(-80.1968433649492,43.0588467962563)'::box)
Filter: (sqrt((pow((80.4113732090646::double precision * (geom[0] - -79.3885::double precision)), 2::double precision) + pow((111.12::double precision * (geom[1] - 43.6438::double precision)), 2::double precision))) < 65::double precision)
SubPlan
-> Aggregate (cost=127.44..127.44 rows=1 width=4) (actual time=59.11..59.11 rows=1 loops=125)
-> Hash IN Join (cost=19.00..127.42 rows=7 width=4) (actual time=36.63..58.66 rows=11 loops=125)
Hash Cond: ("outer".thing_id = "inner".thing_id)
-> Seq Scan on xx_thing (cost=0.00..81.23 rows=5423 width=8) (actual time=0.03..32.64 rows=5423 loops=125)
-> Hash (cost=18.97..18.97 rows=15 width=4) (actual time=0.59..0.59 rows=0 loops=125)
-> Index Scan using idx_thing_group on thing (cost=0.00..18.97 rows=15 width=4) (actual time=0.21..0.42 rows=8 loops=125)
Index Cond: (group_id = $0)
Total runtime: 7424.11 msec
(12 rows)

slo=> slo=> set enable_seqscan = off;
SET

slo=> set enable_mergejoin = off;
SET

slo=> explain analyze
SELECT (select count(distinct xx_id) from xx_thing where thing_id in (select thing_id from thing where group_id = group.group_id)) as num_xxs
FROM group
WHERE geom2 @ make_box(-79.3885,43.6438,65)
AND earth_dist(geom, -79.3885,43.6438) < 65

;
slo-> slo-> slo-> slo-> slo-> slo-> QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Index Scan using idx_group_geom on group (cost=0.00..149.93 rows=1 width=4) (actual time=0.74..239.90 rows=125 loops=1)
Index Cond: (geom2 @ '(-78.5801566350508,44.2287532037437),(-80.1968433649492,43.0588467962563)'::box)
Filter: (sqrt((pow((80.4113732090646::double precision * (geom[0] - -79.3885::double precision)), 2::double precision) + pow((111.12::double precision * (geom[1] - 43.6438::double precision)), 2::double precision))) < 65::double precision)
SubPlan
-> Aggregate (cost=146.75..146.75 rows=1 width=4) (actual time=1.81..1.81 rows=1 loops=125)
-> Nested Loop (cost=19.00..146.73 rows=7 width=4) (actual time=0.65..1.60 rows=11 loops=125)
-> HashAggregate (cost=19.00..19.00 rows=15 width=4) (actual time=0.36..0.63 rows=8 loops=125)
-> Index Scan using idx_thing_group on thing (cost=0.00..18.97 rows=15 width=4) (actual time=0.09..0.22 rows=8 loops=125)
Index Cond: (group_id = $0)
-> Index Scan using idx_xx_thing_loc on xx_thing (cost=0.00..8.43 rows=7 width=8) (actual time=0.05..0.08 rows=1 loops=1055)
Index Cond: (xx_thing.thing_id = "outer".thing_id)
Total runtime: 252.60 msec
(12 rows)

slo=> >

--
greg
---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Nov 11 '05 #1

Subscribe Post Reply

1387

Tom Lane

Greg Stark <gs*****@mit.edu> writes:

This is the same thing someone else pointed out a while ago. The consequences
of guessing wrong and favouring a full table scan instead of index lookups
with nested loops are simply much more severe than the reverse.

It is easy to demonstrate cases where this isn't true.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Nov 11 '05 #2

Alvaro Herrera

On Fri, Aug 29, 2003 at 10:37:32AM -0400, Greg Stark wrote:

Except if it guesses wrong by assuming it isn't selective it would be maybe
50% slower doing lots of index lookups instead of a more efficient full table
scan and join. If it guesses wrong by assuming it'll be very selective as it
is in this case then it's 1000% slower doing repeated full table scans.

Well, there are operators that need some work. I am seeing less than
ideal behaviour with OVERLAPS, but I think it isn't even documented...

--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"In a specialized industrial society, it would be a disaster
to have kids running around loose." (Paul Graham)

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Nov 11 '05 #3

by: Nick Mudge | last post by:

Hi, Does anybody know the performance difference between having your PHP code cached and just running your code with the zend optimizer? Is there much difference? Nick

PHP

Query Optimizer does not ignore constant condition?

by: hemal | last post by:

I came across a very strange situation at work. There is an order of magnitude difference in execution time for the following two queries (10 v/s ~130 msec): select count(*) from table_name...

Oracle Database

Optimizer Logic Tracing Tool

by: Daniel Roy | last post by:

Guys, what I need is a tool which gives details on the choice of an execution plan by the SQL Server. For example, the cost for a hash join might be 200 and 100 for a nested loop, and therefore a...

Microsoft SQL Server

Weird Optimizer Choice

by: Philip Yale | last post by:

I'm very puzzled by the choice of NC index being made by the optimizer in this example. I don't actually think it should use an NC index at all. I have: Table: CustomerStatus_T Single data...

Microsoft SQL Server

Optimizer picks an ineffient plan

by: Bupp Phillips | last post by:

I have a customer table that has the field CUSTOMER_ID as the primary key (cust_pkkey), the table has 102,834 records in it. The following select statement works fine: select * from customer...

PostgreSQL Database

Optimizer bad with sparse keys?

by: Peter Arrenbrecht | last post by:

Hi all We ran into a very annoying optimizer problem recently. We had added a new key and self-join relation to a large production table. That key will be filled very rarely and having just...

DB2 Database

DB2 Optimizer is not using Summary Table (DB2 V 8.1.7)

by: Andrea | last post by:

Hi, I've read the former postings but was not able to solve my problem: I have a Summary Table (or MQT as you like it) and the query optimizer does not seem to use the summary table. I run...

DB2 Database

MS SQL 2005: Where clause and optimizer

by: boa sema | last post by:

Way back when, and at least in version 7 IIRC, the query optimizer gave up when the where clause in a statement contained more than 4 search conditions. Does anyone know if such a limitation...

Microsoft SQL Server

Optimizer questions

by: aj | last post by:

DB2 LUW v8.2 FP 14 RHAS 2.1 Sorry if these are newbie questions. Optimizer stuff is black magic to me. For both of these, assume stats are current and an even distribution of data....

DB2 Database

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

Career Advice

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

Microsoft Access / VBA

Similar topics