By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
437,903 Members | 1,104 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 437,903 IT Pros & Developers. It's quick & easy.

Any way to Convince postgres to push join clause inside subquery aggregate?

P: n/a

I find I often want to be able to do joins against views where the view are
aggregates on a column that has an index. Ie, something like

SELECT a.*, v.n
FROM a
JOIN (select a_id,count(*) as n group by a_id) as v USING (a_id)

where there's an index on b.a_id. assume there are other tables being joined
so I can't just move the aggregate to the outermost layer.

These queries often come about because I often have a fully normalized
structure with many-to-many relationships but sometimes only want to display
an aggregated view of the data. For example, a list of products with the
number in stock, number on order, number sold recently, etc.

It would be really great if postgres could notice that the index is still
useful and push the index lookup inside the aggregate. Working around this is
really awkward and makes it impossible to use real views to hide the details
from the queries. Usually working around it actually means bulding
denormalized tables to act as "materialized views" and building indexes on
those tables.

The frustrating part is that it seems like postgres is almost capable of doing
it. In a simple query it is fully capable of pushing a where clause inside the
aggregate.

But in a join query it doesn't. (I put the "limit 1" on the second table to
ensure it was the best table to use to drive the join.)

Is there some key detail I'm missing that would allow it to push the join
clause inside the aggregate and use the index? If so I can simplify a lot of
design by creating views for these aggregate views instead of creating
denormalized tables and hacking complex queries everywhere.

slo=> explain select * from (select foo_id,count(*) as n from foo_bar group by foo_id) as x where foo_id = 1;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------
Subquery Scan x (cost=0.00..25.24 rows=1 width=12)
-> GroupAggregate (cost=0.00..25.23 rows=1 width=4)
-> Index Scan using idx_foo_bar_foo on foo_bar (cost=0.00..25.08 rows=30 width=4)
Index Cond: (foo_id = 1)
(4 rows)

Time: 226.15 ms

slo=> explain
select *
from (select foo_id,count(*) as n from foo_bar group by foo_id) as x
join (select * from foo limit 1) as foo using (foo_id)
;

QUERY PLAN
---------------------------------------------------------------------------------------
Hash Join (cost=108.46..111.28 rows=1 width=227)
Hash Cond: ("outer".foo_id = "inner".foo_id)
-> Subquery Scan x (cost=108.35..110.36 rows=161 width=12)
-> HashAggregate (cost=108.35..108.75 rows=161 width=4)
-> Seq Scan on foo_bar (cost=0.00..81.23 rows=5423 width=4)
-> Hash (cost=0.11..0.11 rows=1 width=219)
-> Subquery Scan foo (cost=0.00..0.11 rows=1 width=219)
-> Limit (cost=0.00..0.10 rows=1 width=625)
-> Seq Scan on foo (cost=0.00..17.82 rows=182 width=625)
(9 rows)
--
greg
---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Nov 11 '05 #1
Share this Question
Share on Google+
2 Replies


P: n/a
Greg Stark <gs*****@mit.edu> writes:
slo=> explain
select *
from (select foo_id,count(*) as n from foo_bar group by foo_id) as x
join (select * from foo limit 1) as foo using (foo_id)
;


Why not put the subselect in the output list, if that's the kind of plan
you want?

regression=# explain select foo.*, (select count(*) from foo_bar where foo_id = foo.foo_id) as n from foo;
QUERY PLAN
------------------------------------------------------------------------------------
Seq Scan on foo (cost=0.00..17102.50 rows=1000 width=8)
SubPlan
-> Aggregate (cost=17.08..17.08 rows=1 width=0)
-> Index Scan using foobi on foo_bar (cost=0.00..17.07 rows=5 width=0)
Index Cond: (foo_id = $0)
(5 rows)
regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Nov 11 '05 #2

P: n/a
Tom Lane <tg*@sss.pgh.pa.us> writes:
Greg Stark <gs*****@mit.edu> writes:
slo=> explain
select *
from (select foo_id,count(*) as n from foo_bar group by foo_id) as x
join (select * from foo limit 1) as foo using (foo_id)
;


Why not put the subselect in the output list, if that's the kind of plan
you want?


Actually that's the way the query is right now. The problem arises because I
want to add a second column without duplicating the whole thing.

--
greg
---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Nov 11 '05 #3

This discussion thread is closed

Replies have been disabled for this discussion.