468,253 Members | 1,277 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 468,253 developers. It's quick & easy.

Any way to Convince postgres to push join clause inside subquery aggregate?


I find I often want to be able to do joins against views where the view are
aggregates on a column that has an index. Ie, something like

SELECT a.*, v.n
FROM a
JOIN (select a_id,count(*) as n group by a_id) as v USING (a_id)

where there's an index on b.a_id. assume there are other tables being joined
so I can't just move the aggregate to the outermost layer.

These queries often come about because I often have a fully normalized
structure with many-to-many relationships but sometimes only want to display
an aggregated view of the data. For example, a list of products with the
number in stock, number on order, number sold recently, etc.

It would be really great if postgres could notice that the index is still
useful and push the index lookup inside the aggregate. Working around this is
really awkward and makes it impossible to use real views to hide the details
from the queries. Usually working around it actually means bulding
denormalized tables to act as "materialized views" and building indexes on
those tables.

The frustrating part is that it seems like postgres is almost capable of doing
it. In a simple query it is fully capable of pushing a where clause inside the
aggregate.

But in a join query it doesn't. (I put the "limit 1" on the second table to
ensure it was the best table to use to drive the join.)

Is there some key detail I'm missing that would allow it to push the join
clause inside the aggregate and use the index? If so I can simplify a lot of
design by creating views for these aggregate views instead of creating
denormalized tables and hacking complex queries everywhere.

slo=> explain select * from (select foo_id,count(*) as n from foo_bar group by foo_id) as x where foo_id = 1;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------
Subquery Scan x (cost=0.00..25.24 rows=1 width=12)
-> GroupAggregate (cost=0.00..25.23 rows=1 width=4)
-> Index Scan using idx_foo_bar_foo on foo_bar (cost=0.00..25.08 rows=30 width=4)
Index Cond: (foo_id = 1)
(4 rows)

Time: 226.15 ms

slo=> explain
select *
from (select foo_id,count(*) as n from foo_bar group by foo_id) as x
join (select * from foo limit 1) as foo using (foo_id)
;

QUERY PLAN
---------------------------------------------------------------------------------------
Hash Join (cost=108.46..111.28 rows=1 width=227)
Hash Cond: ("outer".foo_id = "inner".foo_id)
-> Subquery Scan x (cost=108.35..110.36 rows=161 width=12)
-> HashAggregate (cost=108.35..108.75 rows=161 width=4)
-> Seq Scan on foo_bar (cost=0.00..81.23 rows=5423 width=4)
-> Hash (cost=0.11..0.11 rows=1 width=219)
-> Subquery Scan foo (cost=0.00..0.11 rows=1 width=219)
-> Limit (cost=0.00..0.10 rows=1 width=625)
-> Seq Scan on foo (cost=0.00..17.82 rows=182 width=625)
(9 rows)
--
greg
---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Nov 11 '05 #1
2 3433
Greg Stark <gs*****@mit.edu> writes:
slo=> explain
select *
from (select foo_id,count(*) as n from foo_bar group by foo_id) as x
join (select * from foo limit 1) as foo using (foo_id)
;


Why not put the subselect in the output list, if that's the kind of plan
you want?

regression=# explain select foo.*, (select count(*) from foo_bar where foo_id = foo.foo_id) as n from foo;
QUERY PLAN
------------------------------------------------------------------------------------
Seq Scan on foo (cost=0.00..17102.50 rows=1000 width=8)
SubPlan
-> Aggregate (cost=17.08..17.08 rows=1 width=0)
-> Index Scan using foobi on foo_bar (cost=0.00..17.07 rows=5 width=0)
Index Cond: (foo_id = $0)
(5 rows)
regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Nov 11 '05 #2
Tom Lane <tg*@sss.pgh.pa.us> writes:
Greg Stark <gs*****@mit.edu> writes:
slo=> explain
select *
from (select foo_id,count(*) as n from foo_bar group by foo_id) as x
join (select * from foo limit 1) as foo using (foo_id)
;


Why not put the subselect in the output list, if that's the kind of plan
you want?


Actually that's the way the query is right now. The problem arises because I
want to add a second column without duplicating the whole thing.

--
greg
---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Nov 11 '05 #3

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

7 posts views Thread by Dave | last post: by
4 posts views Thread by Anthony Robinson | last post: by
9 posts views Thread by shanevanle | last post: by
reply views Thread by NPC403 | last post: by
reply views Thread by kermitthefrogpy | last post: by
reply views Thread by zattat | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.