472,348 Members | 2,041 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,348 software developers and data experts.

Query Planner Filtering Of Specified Value From 'On Distinct' Clause

I have a table from which I'm trying to extract certain information. For
historical reasons, we archive every action on a particular thing
('thing' is identified, funnily enough, by 'id'). So the only way to
find out the current state of a particular combination of attributes is
to "select distinct on (id, ...) ... order by date desc".

In the examples below, I've taken real output from psql and done a
global search/replace on various names to convince myself that I'm not
leaking information. The numbers are intact, just the column/table/view
names are different.

For sake of argument, here is the table structure:
database=> \d souce
Table "public.source"
Column | Type | Modifiers
id | integer | not null
date | timestamp without time zone | default 'now'
second | integer | not null
third | text | not null
Indexes: source btree (id)
This table contains about 98000 rows. Let's create a view into the
"source" table:

database=> create view myview as select distinct
on (id, second, third) *
from source
order by id, second, third, date desc;

Now let's look at the query plan for "select * from myview":

database=> explain analyze select * from myview;
Subquery Scan myview (cost=12483.55..13463.96 rows=9804 width=63)
(actual time=1262.08..1450.68 rows=21089 loops=1)
-> Unique (cost=12483.55..13463.96 rows=9804 width=63)
(actual time=1262.06..1406.71 rows=21089 loops=1)
-> Sort (cost=12483.55..12728.65 rows=98041 width=63)
(actual time=1262.06..1339.34 rows=98063 loops=1)
Sort Key: source.id, source.second, source."third",
-> Seq Scan on source (cost=0.00..2247.41 rows=98041
width=63) (actual time=0.02..269.07 rows=98063 loops=1)
Total runtime: 1467.78 msec
(6 rows)

I'm not sure if there's really anything I can do to make that go faster.
But look what happens when I specify a particular value for one of the
fields in the "on distinct" clause:

database=> explain analyze select * from myview where id=12345;
Subquery Scan myview (cost=12483.55..13463.96 rows=9804 width=63)
(actual time=1446.58..1446.81 rows=2 loops=1)
Filter: (id = 12345)
-> Unique (cost=12483.55..13463.96 rows=9804 width=63)
(actual time=1258.87..1434.39 rows=21089 loops=1)
-> Sort (cost=12483.55..12728.65 rows=98041 width=63)
(actual time=1258.87..1336.89 rows=98063 loops=1)
Sort Key: source.id, source.second, source."third",
-> Seq Scan on source (cost=0.00..2247.41 rows=98041 width=63)
(actual time=0.02..268.54 rows=98063 loops=1)
Total runtime: 1457.53 msec
(7 rows)
Hmm.. not as efficient as I would have liked. The planner happily
executed the whole view, and only then filtered out just the bits that I
wanted. The most expensive parts of the query were done on the
unfiltered set. The result set is only 0.002% of the whole data set, so
I was a little shocked that it took *longer* to generate the results
that I wanted, than it took to create the unfiltered results!

Is there any reason why it would not be sensible to push the filter down
to just after the first sequential scan of source? Since "id" is one of
the fields in the "on distinct" condition, doesn't it make sense that
pushing the filter further down would reduce the amount of work done to
get exactly the same result? The same would be true if I was to filter
on a specific "second" or "third" value.

I understand that filtering on columns not present in the "on distinct"
clause would have drastically different effects when done before or
after the "Unique" processing. However, since the column(s) I'm
filtering on are only those present in the "on distinct" clause, it
makes no difference whether the filter is before or after the

Am I totally stark, raving mad?
Alex Satrapa
---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?


Nov 12 '05 #1
1 2386
Alex Satrapa <al**@lintelsys.com.au> writes:
Is there any reason why it would not be sensible to push the filter down
to just after the first sequential scan of source?

Try 7.4 ...

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?


Nov 12 '05 #2

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

by: Jonathan | last post by:
Hi all! For a match schedule I would like to find all possible combinations of teams playing home and away (without teams playing to themselves...
by: Ed_No_Spam_Please_Weber | last post by:
Hello All & Thanks in advance for your help! Background: 1) tblT_Documents is the primary parent transaction table that has 10 fields and...
by: Egyd Csaba | last post by:
Hi All, how can I improve the query performance in the following situation: I have a big (4.5+ million rows) table. One query takes approx. 9 sec...
by: Riley DeWiley | last post by:
I am continually amazed by SQL's ability to humble me .... I have a toy query into a toy database that looks just like this: SELECT . FROM f,...
by: CSN | last post by:
I have a pretty simple select query that joins a table (p) with 125K rows with another table (pc) with almost one million rows: select p.* from...
by: Zygo Blaxell | last post by:
I have a table with a few million rows of temperature data keyed by timestamp. I want to group these rows by timestamp intervals (e.g. every 32...
by: Jeff Boes | last post by:
I hope this helps someone else ... I had struggled some time ago with attempts to get a rank of values query to work, but then I gave up and set it...
by: clare at snyder.on.ca | last post by:
I have a SQL query I need to design to select name and email addresses for policies that are due and not renewed in a given time period. The...
by: chrisdavis | last post by:
I'm trying to filter by query or put those values in a distinct query in a where clause in some sort of list that it goes through but NOT at the...
by: better678 | last post by:
Question: Discuss your understanding of the Java platform. Is the statement "Java is interpreted" correct? Answer: Java is an object-oriented...
by: teenabhardwaj | last post by:
How would one discover a valid source for learning news, comfort, and help for engineering designs? Covering through piles of books takes a lot of...
by: Kemmylinns12 | last post by:
Blockchain technology has emerged as a transformative force in the business world, offering unprecedented opportunities for innovation and...
by: antdb | last post by:
Ⅰ. Advantage of AntDB: hyper-convergence + streaming processing engine In the overall architecture, a new "hyper-convergence" concept was...
by: AndyPSV | last post by:
HOW CAN I CREATE AN AI with an .executable file that would suck all files in the folder and on my computerHOW CAN I CREATE AN AI with an .executable...
by: Arjunsri | last post by:
I have a Redshift database that I need to use as an import data source. I have configured the DSN connection using the server, port, database, and...
by: WisdomUfot | last post by:
It's an interesting question you've got about how Gmail hides the HTTP referrer when a link in an email is clicked. While I don't have the specific...
by: Oralloy | last post by:
Hello Folks, I am trying to hook up a CPU which I designed using SystemC to I/O pins on an FPGA. My problem (spelled failure) is with the...
by: Carina712 | last post by:
Setting background colors for Excel documents can help to improve the visual appeal of the document and make it easier to read and understand....

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.