By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
435,500 Members | 2,809 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 435,500 IT Pros & Developers. It's quick & easy.

Query Planner Filtering Of Specified Value From 'On Distinct' Clause

P: n/a
I have a table from which I'm trying to extract certain information. For
historical reasons, we archive every action on a particular thing
('thing' is identified, funnily enough, by 'id'). So the only way to
find out the current state of a particular combination of attributes is
to "select distinct on (id, ...) ... order by date desc".

In the examples below, I've taken real output from psql and done a
global search/replace on various names to convince myself that I'm not
leaking information. The numbers are intact, just the column/table/view
names are different.

For sake of argument, here is the table structure:
database=> \d souce
Table "public.source"
Column | Type | Modifiers
---------+-----------------------------+---------------
id | integer | not null
date | timestamp without time zone | default 'now'
second | integer | not null
third | text | not null
Indexes: source btree (id)
This table contains about 98000 rows. Let's create a view into the
"source" table:

database=> create view myview as select distinct
on (id, second, third) *
from source
order by id, second, third, date desc;
CREATE VIEW

Now let's look at the query plan for "select * from myview":

database=> explain analyze select * from myview;
QUERY PLAN
--------------------------------------------------------------
Subquery Scan myview (cost=12483.55..13463.96 rows=9804 width=63)
(actual time=1262.08..1450.68 rows=21089 loops=1)
-> Unique (cost=12483.55..13463.96 rows=9804 width=63)
(actual time=1262.06..1406.71 rows=21089 loops=1)
-> Sort (cost=12483.55..12728.65 rows=98041 width=63)
(actual time=1262.06..1339.34 rows=98063 loops=1)
Sort Key: source.id, source.second, source."third",
source.date
-> Seq Scan on source (cost=0.00..2247.41 rows=98041
width=63) (actual time=0.02..269.07 rows=98063 loops=1)
Total runtime: 1467.78 msec
(6 rows)

I'm not sure if there's really anything I can do to make that go faster.
But look what happens when I specify a particular value for one of the
fields in the "on distinct" clause:

database=> explain analyze select * from myview where id=12345;
QUERY PLAN
--------------------------------------------------------------
Subquery Scan myview (cost=12483.55..13463.96 rows=9804 width=63)
(actual time=1446.58..1446.81 rows=2 loops=1)
Filter: (id = 12345)
-> Unique (cost=12483.55..13463.96 rows=9804 width=63)
(actual time=1258.87..1434.39 rows=21089 loops=1)
-> Sort (cost=12483.55..12728.65 rows=98041 width=63)
(actual time=1258.87..1336.89 rows=98063 loops=1)
Sort Key: source.id, source.second, source."third",
source.date
-> Seq Scan on source (cost=0.00..2247.41 rows=98041 width=63)
(actual time=0.02..268.54 rows=98063 loops=1)
Total runtime: 1457.53 msec
(7 rows)
Hmm.. not as efficient as I would have liked. The planner happily
executed the whole view, and only then filtered out just the bits that I
wanted. The most expensive parts of the query were done on the
unfiltered set. The result set is only 0.002% of the whole data set, so
I was a little shocked that it took *longer* to generate the results
that I wanted, than it took to create the unfiltered results!

Is there any reason why it would not be sensible to push the filter down
to just after the first sequential scan of source? Since "id" is one of
the fields in the "on distinct" condition, doesn't it make sense that
pushing the filter further down would reduce the amount of work done to
get exactly the same result? The same would be true if I was to filter
on a specific "second" or "third" value.

I understand that filtering on columns not present in the "on distinct"
clause would have drastically different effects when done before or
after the "Unique" processing. However, since the column(s) I'm
filtering on are only those present in the "on distinct" clause, it
makes no difference whether the filter is before or after the
"Unique"-ifying.

Am I totally stark, raving mad?
Alex Satrapa
---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Nov 12 '05 #1
Share this Question
Share on Google+
1 Reply


P: n/a
Alex Satrapa <al**@lintelsys.com.au> writes:
Is there any reason why it would not be sensible to push the filter down
to just after the first sequential scan of source?


Try 7.4 ...

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Nov 12 '05 #2

This discussion thread is closed

Replies have been disabled for this discussion.