Dear group,
I got excellent help here a while back in optimizing a slow query, so I
thought I'd try again with another one...
I have a small table, around 3000 rows with two columns. One is a
timestamp and the other is an integer. Each day, a number of entries
are created, with integers ranging from 1 to 5.
I need to find how many entries with the integer of 3 are created per
day, so I do:
select date(tested) as testdate, count(*) as testcnt
from tests
where result = 3
group by date(tested)
order by date(tested);
This query runs fast, as expected. However, I also need the query to
return rows for the days where there aren't any integers of value 3
(but still integers of the other values.) So I do the following:
select date(t1.tested), t2.testcnt
from tests t1
left join (
select date(tested) as testdate, count(*) as testcnt
from tests
where result = 22
group by date(tested)
) t2
on t2.testdate = date(t1.tested)
group by date(t1.tested), t2.testcnt
order by date(t1.tested);
That is, I create a query that returns one date for each day and a
subquery that returns one date and count for each day that has integers
of value 3, and I join them on the dates.
Even though both the inner and outer queries run in a few microseconds
by themselves, when I join them up like this, the query takes three
seconds to run on a P4 2.5GHz.
The tests table has indexes on everything. I have vacuumed and
analyzed. This is the output from explain analyze:
QUERY PLAN
Group (cost=378748.88..378770.02 rows=282 width=20) (actual
time=3229.93..3234.55 rows=30 loops=1)
-> Sort (cost=378748.88..378755.93 rows=2819 width=20) (actual
time=3229.74..3231.94 rows=2819 loops=1)
Sort Key: date(t1.tested), t2.testcnt
-> Nested Loop (cost=263.38..378587.34 rows=2819 width=20)
(actual time=12.99..3207.15 rows=2819 loops=1)
Join Filter: ("inner".testdate = date("outer".tested))
-> Seq Scan on tests t1 (cost=0.00..245.19 rows=2819
width=8) (actual time=3.88..8.62 rows=2819 loops=1)
-> Subquery Scan t2 (cost=263.38..265.44 rows=27
width=8) (actual time=0.01..1.11 rows=16 loops=2819)
-> Aggregate (cost=263.38..265.44 rows=27
width=8) (actual time=0.01..1.08 rows=16 loops=2819)
-> Group (cost=263.38..264.75 rows=275
width=8) (actual time=0.01..0.79 rows=275 loops=2819)
-> Sort (cost=263.38..264.07
rows=275 width=8) (actual time=0.00..0.23 rows=275 loops=2819\
)
Sort Key: date(tested)
-> Seq Scan on tests
(cost=0.00..252.24 rows=275 width=8) (actual time=4.07..7.42 ro\
ws=275 loops=1)
Filter: (result =
'3'::text)
Total runtime: 3235.62 msec
(14 rows)
Unfortunately, I'm not very good at reading these, so I'm not sure
what's going on. It looks to me like maybe it's running the inner query
many times instead of just one? That would not be right because there
are no dependencies between the inner and the outer query. It should
only need to run each once. I'm running PostgreSQL 7.3.2.
Any help would be much appreciated.
Roger Dahl