By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
438,372 Members | 1,906 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 438,372 IT Pros & Developers. It's quick & easy.

Optimizer isn't perfect

P: n/a

Hm, here's a query where the optimizer is choosing the wrong plan by far. I
think it boils down to it guessing wrong on how selective an rtree index is,
which I guess would be hard to predict.

Except if it guesses wrong by assuming it isn't selective it would be maybe
50% slower doing lots of index lookups instead of a more efficient full table
scan and join. If it guesses wrong by assuming it'll be very selective as it
is in this case then it's 1000% slower doing repeated full table scans.

This is the same thing someone else pointed out a while ago. The consequences
of guessing wrong and favouring a full table scan instead of index lookups
with nested loops are simply much more severe than the reverse.

Even if the full table scan was faster with one dataset I might prefer to use
the slower index lookup and live with the lower performance but be able to
guarantee that a sudden shift in data distribution or user behaviour wouldn't
suddenly cause a complete outage. I'm wondering if it doesn't make sense to go
into production with enable_seqscan = off.
slo=> explain analyze
SELECT (select count(distinct xx_id) from xx_thing where thing_id in (select thing_id from thing where group_id = group.group_id)) as num_xxs
FROM group
WHERE geom2 @ make_box(-79.3885,43.6438,65)
AND earth_dist(geom, -79.3885,43.6438) < 65

;
slo-> slo-> slo-> slo-> slo-> slo->
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Index Scan using idx_group_geom on group (cost=0.00..130.62 rows=1 width=4) (actual time=69.53..7410.31 rows=125 loops=1)
Index Cond: (geom2 @ '(-78.5801566350508,44.2287532037437),(-80.1968433649492,43.0588467962563)'::box)
Filter: (sqrt((pow((80.4113732090646::double precision * (geom[0] - -79.3885::double precision)), 2::double precision) + pow((111.12::double precision * (geom[1] - 43.6438::double precision)), 2::double precision))) < 65::double precision)
SubPlan
-> Aggregate (cost=127.44..127.44 rows=1 width=4) (actual time=59.11..59.11 rows=1 loops=125)
-> Hash IN Join (cost=19.00..127.42 rows=7 width=4) (actual time=36.63..58.66 rows=11 loops=125)
Hash Cond: ("outer".thing_id = "inner".thing_id)
-> Seq Scan on xx_thing (cost=0.00..81.23 rows=5423 width=8) (actual time=0.03..32.64 rows=5423 loops=125)
-> Hash (cost=18.97..18.97 rows=15 width=4) (actual time=0.59..0.59 rows=0 loops=125)
-> Index Scan using idx_thing_group on thing (cost=0.00..18.97 rows=15 width=4) (actual time=0.21..0.42 rows=8 loops=125)
Index Cond: (group_id = $0)
Total runtime: 7424.11 msec
(12 rows)

slo=> slo=> set enable_seqscan = off;
SET

slo=> set enable_mergejoin = off;
SET

slo=> explain analyze
SELECT (select count(distinct xx_id) from xx_thing where thing_id in (select thing_id from thing where group_id = group.group_id)) as num_xxs
FROM group
WHERE geom2 @ make_box(-79.3885,43.6438,65)
AND earth_dist(geom, -79.3885,43.6438) < 65

;
slo-> slo-> slo-> slo-> slo-> slo-> QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Index Scan using idx_group_geom on group (cost=0.00..149.93 rows=1 width=4) (actual time=0.74..239.90 rows=125 loops=1)
Index Cond: (geom2 @ '(-78.5801566350508,44.2287532037437),(-80.1968433649492,43.0588467962563)'::box)
Filter: (sqrt((pow((80.4113732090646::double precision * (geom[0] - -79.3885::double precision)), 2::double precision) + pow((111.12::double precision * (geom[1] - 43.6438::double precision)), 2::double precision))) < 65::double precision)
SubPlan
-> Aggregate (cost=146.75..146.75 rows=1 width=4) (actual time=1.81..1.81 rows=1 loops=125)
-> Nested Loop (cost=19.00..146.73 rows=7 width=4) (actual time=0.65..1.60 rows=11 loops=125)
-> HashAggregate (cost=19.00..19.00 rows=15 width=4) (actual time=0.36..0.63 rows=8 loops=125)
-> Index Scan using idx_thing_group on thing (cost=0.00..18.97 rows=15 width=4) (actual time=0.09..0.22 rows=8 loops=125)
Index Cond: (group_id = $0)
-> Index Scan using idx_xx_thing_loc on xx_thing (cost=0.00..8.43 rows=7 width=8) (actual time=0.05..0.08 rows=1 loops=1055)
Index Cond: (xx_thing.thing_id = "outer".thing_id)
Total runtime: 252.60 msec
(12 rows)

slo=> >

--
greg
---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Nov 11 '05 #1
Share this Question
Share on Google+
2 Replies


P: n/a
Greg Stark <gs*****@mit.edu> writes:
This is the same thing someone else pointed out a while ago. The consequences
of guessing wrong and favouring a full table scan instead of index lookups
with nested loops are simply much more severe than the reverse.


It is easy to demonstrate cases where this isn't true.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Nov 11 '05 #2

P: n/a
On Fri, Aug 29, 2003 at 10:37:32AM -0400, Greg Stark wrote:
Except if it guesses wrong by assuming it isn't selective it would be maybe
50% slower doing lots of index lookups instead of a more efficient full table
scan and join. If it guesses wrong by assuming it'll be very selective as it
is in this case then it's 1000% slower doing repeated full table scans.


Well, there are operators that need some work. I am seeing less than
ideal behaviour with OVERLAPS, but I think it isn't even documented...

--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"In a specialized industrial society, it would be a disaster
to have kids running around loose." (Paul Graham)

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Nov 11 '05 #3

This discussion thread is closed

Replies have been disabled for this discussion.