473,405 Members | 2,300 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,405 software developers and data experts.

Optimizer isn't perfect


Hm, here's a query where the optimizer is choosing the wrong plan by far. I
think it boils down to it guessing wrong on how selective an rtree index is,
which I guess would be hard to predict.

Except if it guesses wrong by assuming it isn't selective it would be maybe
50% slower doing lots of index lookups instead of a more efficient full table
scan and join. If it guesses wrong by assuming it'll be very selective as it
is in this case then it's 1000% slower doing repeated full table scans.

This is the same thing someone else pointed out a while ago. The consequences
of guessing wrong and favouring a full table scan instead of index lookups
with nested loops are simply much more severe than the reverse.

Even if the full table scan was faster with one dataset I might prefer to use
the slower index lookup and live with the lower performance but be able to
guarantee that a sudden shift in data distribution or user behaviour wouldn't
suddenly cause a complete outage. I'm wondering if it doesn't make sense to go
into production with enable_seqscan = off.
slo=> explain analyze
SELECT (select count(distinct xx_id) from xx_thing where thing_id in (select thing_id from thing where group_id = group.group_id)) as num_xxs
FROM group
WHERE geom2 @ make_box(-79.3885,43.6438,65)
AND earth_dist(geom, -79.3885,43.6438) < 65

;
slo-> slo-> slo-> slo-> slo-> slo->
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Index Scan using idx_group_geom on group (cost=0.00..130.62 rows=1 width=4) (actual time=69.53..7410.31 rows=125 loops=1)
Index Cond: (geom2 @ '(-78.5801566350508,44.2287532037437),(-80.1968433649492,43.0588467962563)'::box)
Filter: (sqrt((pow((80.4113732090646::double precision * (geom[0] - -79.3885::double precision)), 2::double precision) + pow((111.12::double precision * (geom[1] - 43.6438::double precision)), 2::double precision))) < 65::double precision)
SubPlan
-> Aggregate (cost=127.44..127.44 rows=1 width=4) (actual time=59.11..59.11 rows=1 loops=125)
-> Hash IN Join (cost=19.00..127.42 rows=7 width=4) (actual time=36.63..58.66 rows=11 loops=125)
Hash Cond: ("outer".thing_id = "inner".thing_id)
-> Seq Scan on xx_thing (cost=0.00..81.23 rows=5423 width=8) (actual time=0.03..32.64 rows=5423 loops=125)
-> Hash (cost=18.97..18.97 rows=15 width=4) (actual time=0.59..0.59 rows=0 loops=125)
-> Index Scan using idx_thing_group on thing (cost=0.00..18.97 rows=15 width=4) (actual time=0.21..0.42 rows=8 loops=125)
Index Cond: (group_id = $0)
Total runtime: 7424.11 msec
(12 rows)

slo=> slo=> set enable_seqscan = off;
SET

slo=> set enable_mergejoin = off;
SET

slo=> explain analyze
SELECT (select count(distinct xx_id) from xx_thing where thing_id in (select thing_id from thing where group_id = group.group_id)) as num_xxs
FROM group
WHERE geom2 @ make_box(-79.3885,43.6438,65)
AND earth_dist(geom, -79.3885,43.6438) < 65

;
slo-> slo-> slo-> slo-> slo-> slo-> QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Index Scan using idx_group_geom on group (cost=0.00..149.93 rows=1 width=4) (actual time=0.74..239.90 rows=125 loops=1)
Index Cond: (geom2 @ '(-78.5801566350508,44.2287532037437),(-80.1968433649492,43.0588467962563)'::box)
Filter: (sqrt((pow((80.4113732090646::double precision * (geom[0] - -79.3885::double precision)), 2::double precision) + pow((111.12::double precision * (geom[1] - 43.6438::double precision)), 2::double precision))) < 65::double precision)
SubPlan
-> Aggregate (cost=146.75..146.75 rows=1 width=4) (actual time=1.81..1.81 rows=1 loops=125)
-> Nested Loop (cost=19.00..146.73 rows=7 width=4) (actual time=0.65..1.60 rows=11 loops=125)
-> HashAggregate (cost=19.00..19.00 rows=15 width=4) (actual time=0.36..0.63 rows=8 loops=125)
-> Index Scan using idx_thing_group on thing (cost=0.00..18.97 rows=15 width=4) (actual time=0.09..0.22 rows=8 loops=125)
Index Cond: (group_id = $0)
-> Index Scan using idx_xx_thing_loc on xx_thing (cost=0.00..8.43 rows=7 width=8) (actual time=0.05..0.08 rows=1 loops=1055)
Index Cond: (xx_thing.thing_id = "outer".thing_id)
Total runtime: 252.60 msec
(12 rows)

slo=> >

--
greg
---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Nov 11 '05 #1
2 1387
Greg Stark <gs*****@mit.edu> writes:
This is the same thing someone else pointed out a while ago. The consequences
of guessing wrong and favouring a full table scan instead of index lookups
with nested loops are simply much more severe than the reverse.


It is easy to demonstrate cases where this isn't true.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Nov 11 '05 #2
On Fri, Aug 29, 2003 at 10:37:32AM -0400, Greg Stark wrote:
Except if it guesses wrong by assuming it isn't selective it would be maybe
50% slower doing lots of index lookups instead of a more efficient full table
scan and join. If it guesses wrong by assuming it'll be very selective as it
is in this case then it's 1000% slower doing repeated full table scans.


Well, there are operators that need some work. I am seeing less than
ideal behaviour with OVERLAPS, but I think it isn't even documented...

--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"In a specialized industrial society, it would be a disaster
to have kids running around loose." (Paul Graham)

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Nov 11 '05 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: Nick Mudge | last post by:
Hi, Does anybody know the performance difference between having your PHP code cached and just running your code with the zend optimizer? Is there much difference? Nick
9
by: hemal | last post by:
I came across a very strange situation at work. There is an order of magnitude difference in execution time for the following two queries (10 v/s ~130 msec): select count(*) from table_name...
2
by: Daniel Roy | last post by:
Guys, what I need is a tool which gives details on the choice of an execution plan by the SQL Server. For example, the cost for a hash join might be 200 and 100 for a nested loop, and therefore a...
3
by: Philip Yale | last post by:
I'm very puzzled by the choice of NC index being made by the optimizer in this example. I don't actually think it should use an NC index at all. I have: Table: CustomerStatus_T Single data...
14
by: Bupp Phillips | last post by:
I have a customer table that has the field CUSTOMER_ID as the primary key (cust_pkkey), the table has 102,834 records in it. The following select statement works fine: select * from customer...
3
by: Peter Arrenbrecht | last post by:
Hi all We ran into a very annoying optimizer problem recently. We had added a new key and self-join relation to a large production table. That key will be filled very rarely and having just...
9
by: Andrea | last post by:
Hi, I've read the former postings but was not able to solve my problem: I have a Summary Table (or MQT as you like it) and the query optimizer does not seem to use the summary table. I run...
2
by: boa sema | last post by:
Way back when, and at least in version 7 IIRC, the query optimizer gave up when the where clause in a statement contained more than 4 search conditions. Does anyone know if such a limitation...
3
by: aj | last post by:
DB2 LUW v8.2 FP 14 RHAS 2.1 Sorry if these are newbie questions. Optimizer stuff is black magic to me. For both of these, assume stats are current and an even distribution of data....
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.