By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
432,002 Members | 2,137 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 432,002 IT Pros & Developers. It's quick & easy.

Mysterious performance of query because of plsql function in where condition

P: n/a
Hi there,

i have a problem with a query that uses the result of a plsql function
In the where clause:

SELECT
assignments.assignment_id,
assignments.package_id AS package_id,
assignments.title AS title,
COUNT(*) AS Count
FROM
assignments INNER JOIN submissions ON
(assignments.assignment_id=submissions.assignment_ id)
WHERE
package_id=949589 AND
submission_status(submissions.submission_id)='clos ed'
GROUP BY
assignments.assignment_id, assignments.package_id, assignments.title
ORDER BY
assignments.title;

Postgres seems to execute the function "submission_status" for every row
of the submissions table (~1500 rows). The query therefore takes quite a
lot time, although in fact no row is returned from the assignments table
when the condition package_id=949589 is used.

QUERY PLAN

------------------------------------------------------------------------
------------------------------------------------------
Sort (cost=41.21..41.21 rows=1 width=35) (actual
time=4276.978..4276.978 rows=0 loops=1)
Sort Key: assignments.title
-> HashAggregate (cost=41.19..41.20 rows=1 width=35) (actual
time=4276.970..4276.970 rows=0 loops=1)
-> Hash Join (cost=2.40..41.18 rows=1 width=35) (actual
time=4276.966..4276.966 rows=0 loops=1)
Hash Cond: ("outer".assignment_id =
"inner".assignment_id)
-> Seq Scan on submissions (cost=0.00..38.73 rows=9
width=4) (actual time=10.902..4276.745 rows=38 loops=1)
Filter: (submission_status(submission_id) =
'closed'::text)
-> Hash (cost=2.40..2.40 rows=2 width=35) (actual
time=0.058..0.058 rows=0 loops=1)
-> Seq Scan on assignments (cost=0.00..2.40
rows=2 width=35) (actual time=0.015..0.052 rows=2 loops=1)
Filter: (package_id = 949589)
Total runtime: 4277.078 ms
(11 rows)

I therefore tried to rephrase the query, to make sure that the function
is only used for the rows returned by the join but not even the
following does help (the subselect t1 does not return a single row):

select * from (
SELECT
a.assignment_id, a.package_id, a.title, s.submission_id,
COUNT(*) AS Count
FROM
assignments a INNER JOIN submissions s ON
(a.assignment_id=s.assignment_id)
WHERE
a.package_id=949589
GROUP BY
a.assignment_id, a.package_id, a.title, s.submission_id
) t1
where
submission_status(t1.submission_id)='closed'
order by
title;

QUERY PLAN

------------------------------------------------------------------------
--------------------------------------------------------------
Sort (cost=41.21..41.22 rows=1 width=188) (actual
time=4114.251..4114.251 rows=0 loops=1)
Sort Key: title
-> Subquery Scan t1 (cost=41.20..41.20 rows=1 width=188) (actual
time=4114.242..4114.242 rows=0 loops=1)
-> HashAggregate (cost=41.20..41.20 rows=1 width=39) (actual
time=4114.238..4114.238 rows=0 loops=1)
-> Hash Join (cost=2.40..41.18 rows=1 width=39) (actual
time=4114.235..4114.235 rows=0 loops=1)
Hash Cond: ("outer".assignment_id =
"inner".assignment_id)
-> Seq Scan on submissions s (cost=0.00..38.73
rows=9 width=8) (actual time=7.179..4113.984 rows=38 loops=1)
Filter: (submission_status(submission_id) =
'closed'::text)
-> Hash (cost=2.40..2.40 rows=2 width=35) (actual
time=0.100..0.100 rows=0 loops=1)
-> Seq Scan on assignments a
(cost=0.00..2.40 rows=2 width=35) (actual time=0.045..0.094 rows=2
loops=1)
Filter: (package_id = 949589)
Total runtime: 4114.356 ms
(12 rows)

The function is nevertheless executed for every row in the submissions
table. A simple "select *, submission_status(submission_id) from
submissions" takes about the same time as the 2 queries stated above.

The whole database has been vacuum analysed right before the explain
analyse output has been captured.

What can I do to reduce the time this query takes? And why is the
function executed although there is no row in the result set of t1 in my
rephrased query?

TIA, peter

--
pe***********@wu-wien.ac.at Tel: +43/1/31336/4341
Abteilung fr Wirtschaftsinformatik, Wirtschaftsuniversitaet Wien,
Austria

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ma*******@postgresql.org so that your
message can get through to the mailing list cleanly

Nov 23 '05 #1
Share this Question
Share on Google+
1 Reply


P: n/a
Do you have any indexes created on the submissions table? If not
postgresql has no choice but read every row in the table.

If you do they are not being used. The tuning masters would really need
to see definition of the table if the indexes are not being used.

You might find a lot of pointers in the pgsql-performance mailing list
instead of this one.

Mike
On Thu, 2004-07-01 at 06:52, Peter Alberer wrote:
Hi there,

i have a problem with a query that uses the result of a plsql function
In the where clause:

SELECT
assignments.assignment_id,
assignments.package_id AS package_id,
assignments.title AS title,
COUNT(*) AS Count
FROM
assignments INNER JOIN submissions ON
(assignments.assignment_id=submissions.assignment_ id)
WHERE
package_id=949589 AND
submission_status(submissions.submission_id)='clos ed'
GROUP BY
assignments.assignment_id, assignments.package_id, assignments.title
ORDER BY
assignments.title;

Postgres seems to execute the function "submission_status" for every row
of the submissions table (~1500 rows). The query therefore takes quite a
lot time, although in fact no row is returned from the assignments table
when the condition package_id=949589 is used.

QUERY PLAN

------------------------------------------------------------------------
------------------------------------------------------
Sort (cost=41.21..41.21 rows=1 width=35) (actual
time=4276.978..4276.978 rows=0 loops=1)
Sort Key: assignments.title
-> HashAggregate (cost=41.19..41.20 rows=1 width=35) (actual
time=4276.970..4276.970 rows=0 loops=1)
-> Hash Join (cost=2.40..41.18 rows=1 width=35) (actual
time=4276.966..4276.966 rows=0 loops=1)
Hash Cond: ("outer".assignment_id =
"inner".assignment_id)
-> Seq Scan on submissions (cost=0.00..38.73 rows=9
width=4) (actual time=10.902..4276.745 rows=38 loops=1)
Filter: (submission_status(submission_id) =
'closed'::text)
-> Hash (cost=2.40..2.40 rows=2 width=35) (actual
time=0.058..0.058 rows=0 loops=1)
-> Seq Scan on assignments (cost=0.00..2.40
rows=2 width=35) (actual time=0.015..0.052 rows=2 loops=1)
Filter: (package_id = 949589)
Total runtime: 4277.078 ms
(11 rows)

I therefore tried to rephrase the query, to make sure that the function
is only used for the rows returned by the join but not even the
following does help (the subselect t1 does not return a single row):

select * from (
SELECT
a.assignment_id, a.package_id, a.title, s.submission_id,
COUNT(*) AS Count
FROM
assignments a INNER JOIN submissions s ON
(a.assignment_id=s.assignment_id)
WHERE
a.package_id=949589
GROUP BY
a.assignment_id, a.package_id, a.title, s.submission_id
) t1
where
submission_status(t1.submission_id)='closed'
order by
title;

QUERY PLAN

------------------------------------------------------------------------
--------------------------------------------------------------
Sort (cost=41.21..41.22 rows=1 width=188) (actual
time=4114.251..4114.251 rows=0 loops=1)
Sort Key: title
-> Subquery Scan t1 (cost=41.20..41.20 rows=1 width=188) (actual
time=4114.242..4114.242 rows=0 loops=1)
-> HashAggregate (cost=41.20..41.20 rows=1 width=39) (actual
time=4114.238..4114.238 rows=0 loops=1)
-> Hash Join (cost=2.40..41.18 rows=1 width=39) (actual
time=4114.235..4114.235 rows=0 loops=1)
Hash Cond: ("outer".assignment_id =
"inner".assignment_id)
-> Seq Scan on submissions s (cost=0.00..38.73
rows=9 width=8) (actual time=7.179..4113.984 rows=38 loops=1)
Filter: (submission_status(submission_id) =
'closed'::text)
-> Hash (cost=2.40..2.40 rows=2 width=35) (actual
time=0.100..0.100 rows=0 loops=1)
-> Seq Scan on assignments a
(cost=0.00..2.40 rows=2 width=35) (actual time=0.045..0.094 rows=2
loops=1)
Filter: (package_id = 949589)
Total runtime: 4114.356 ms
(12 rows)

The function is nevertheless executed for every row in the submissions
table. A simple "select *, submission_status(submission_id) from
submissions" takes about the same time as the 2 queries stated above.

The whole database has been vacuum analysed right before the explain
analyse output has been captured.

What can I do to reduce the time this query takes? And why is the
function executed although there is no row in the result set of t1 in my
rephrased query?

TIA, peter

--
pe***********@wu-wien.ac.at Tel: +43/1/31336/4341
Abteilung für Wirtschaftsinformatik, Wirtschaftsuniversitaet Wien,
Austria

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ma*******@postgresql.org so that your
message can get through to the mailing list cleanly


---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Nov 23 '05 #2

This discussion thread is closed

Replies have been disabled for this discussion.