By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
432,403 Members | 891 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 432,403 IT Pros & Developers. It's quick & easy.

bug in query planning?

P: n/a
I have a query which does not use column indexes that it should use. I
have discovered some interesting behaviors of Postgres which may
indicate a bug in the database's query planning.

Take a look at the query below. There is a btree index on both
m.account_id and a.account_id. Query (1) does not use the index on the
messages table, instead opting for a full table scan, thus killing
performance. The messages table can contain potentially hundreds of
thousands or millions of rows. Even at 50,000, it's murder.

Query (2) below is the same query, but we reverse the order of the
tables. It's obviously not quite the same query semantically, even
though in my case it should always produce the same result. It is
interesting to note that it uses the indexes tho.

Finally, query (3) below uses traditional joining (non-ANSI). Indexes
are correctly used in that query. The suggestion is that Postgres does
not correctly analyze queries using ANSI joins. Indexes are
occasionally skipped when they should be used. This seems like a bug
in Postgres. I'm using version 7.3.4 of Postgres.

Thanks in advance for any comments...
steve
Query (1)
=========
defender=# explain analyze
defender-# select count(message_id)
defender-# from messages m
defender-# left join accounts a
defender-# on m.account_id::bigint = a.account_id::bigint
defender-# where a.email = 's******@neosynapse.net';
QUERY
PLAN
------------------------------------------------------------------------
--------------------------------------------------------------------
Aggregate (cost=20461.10..20461.10 rows=1 width=47) (actual
time=1420.09..1420.09 rows=1 loops=1)
-> Hash Join (cost=30.77..20334.38 rows=50687 width=47) (actual
time=0.51..1319.69 rows=51419 loops=1)
Hash Cond: ("outer".account_id = "inner".account_id)
Filter: ("inner".email = 's******@neosynapse.net'::text)
-> Seq Scan on messages m (cost=0.00..19289.87 rows=50687
width=16) (actual time=0.06..703.89 rows=52541 loops=1)
-> Hash (cost=30.76..30.76 rows=3 width=31) (actual
time=0.40..0.40 rows=0 loops=1)
-> Index Scan using accounts_pkey on accounts a
(cost=0.00..30.76 rows=3 width=31) (actual time=0.17..0.38 rows=3
loops=1)
Total runtime: 1420.25 msec
(8 rows)
Query (2)
=========
defender=# explain analyze
defender-# select count(message_id)
defender-# from accounts a
defender-# left join messages m
defender-# on a.account_id::bigint = m.account_id::bigint
defender-# where a.email = 's******@neosynapse.net';

QUERY PLAN
------------------------------------------------------------------------
------------------------------------------------------------------------
--------
Aggregate (cost=6806.54..6806.54 rows=1 width=24) (actual
time=792.14..792.14 rows=1 loops=1)
-> Nested Loop (cost=0.00..6764.30 rows=16896 width=24) (actual
time=0.38..718.12 rows=51419 loops=1)
-> Index Scan using accounts_email on accounts a
(cost=0.00..8.98 rows=1 width=8) (actual time=0.22..0.25 rows=1
loops=1)
Index Cond: (email = 's******@neosynapse.net'::text)
-> Index Scan using messages_account_id on messages m
(cost=0.00..6544.13 rows=16896 width=16) (actual time=0.15..593.15
rows=51419 loops=1)
Index Cond: ("outer".account_id = m.account_id)
Total runtime: 792.33 msec
(7 rows)
Query (3)
=========
defender=# explain analyze
defender-# select count(message_id)
defender-# from messages m, accounts a
defender-# where m.account_id::bigint = a.account_id::bigint
defender-# and a.email = 's******@neosynapse.net';

QUERY PLAN
------------------------------------------------------------------------
------------------------------------------------------------------------
--------
Aggregate (cost=6806.54..6806.54 rows=1 width=24) (actual
time=782.30..782.30 rows=1 loops=1)
-> Nested Loop (cost=0.00..6764.30 rows=16896 width=24) (actual
time=0.33..708.52 rows=51422 loops=1)
-> Index Scan using accounts_email on accounts a
(cost=0.00..8.98 rows=1 width=8) (actual time=0.15..0.18 rows=1
loops=1)
Index Cond: (email = 's******@neosynapse.net'::text)
-> Index Scan using messages_account_id on messages m
(cost=0.00..6544.13 rows=16896 width=16) (actual time=0.15..578.23
rows=51422 loops=1)
Index Cond: (m.account_id = "outer".account_id)
Total runtime: 782.46 msec
(7 rows)
---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Nov 12 '05 #1
Share this Question
Share on Google+
6 Replies


P: n/a
"Steven D.Arnold" <st*****@neosynapse.net> writes:
Query (2) below is the same query, but we reverse the order of the
tables. It's obviously not quite the same query semantically, even
though in my case it should always produce the same result.
Since it is in fact not the same query, I'm unclear on why you expect
it to produce the same plan.
I'm using version 7.3.4 of Postgres.


FWIW, I believe that 7.4 will recognize that (1) and (3) are
semantically equivalent.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to ma*******@postgresql.org

Nov 12 '05 #2

P: n/a

On Dec 21, 2003, at 11:47 PM, Tom Lane wrote:
"Steven D.Arnold" <st*****@neosynapse.net> writes:
Query (2) below is the same query, but we reverse the order of the
tables. It's obviously not quite the same query semantically, even
though in my case it should always produce the same result.
Since it is in fact not the same query, I'm unclear on why you expect
it to produce the same plan.


What I expect is for both queries to use the index on the messages
table! Why is it not doing that?
FWIW, I believe that 7.4 will recognize that (1) and (3) are
semantically equivalent.


I will try 7.4 and report back.

steve
---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ma*******@postgresql.org so that your
message can get through to the mailing list cleanly

Nov 12 '05 #3

P: n/a
The queries are listed here for the referentially (yes that's a pun)
challenged.

Query 1:
SELECT COUNT(message_id)
FROM messages m
LEFT JOIN accounts a
ON m.account_id::bigint = a.account_id::bigint
WHERE a.email = 's******@neosynapse.net';

Query 2:
SELECT COUNT(message_id)
FROM accounts a
LEFT JOIN messages m
ON a.account_id::bigint = m.account_id::bigint
WHERE a.email = 's******@neosynapse.net';

Query 3:
SELECT COUNT(message_id)
FROM messages m, accounts a
WHERE m.account_id::bigint = a.account_id::bigint
AND a.email = 's******@neosynapse.net';

From what I can see they are not the same query and therefore shouldn't
use the same plan.

The first query is saying go get all the messages (best done with a seq
scan since there is no where to limit the results of the message table
[using an index scan would just add the overhead of reading the pages
for the index, the computational time to resolve the index entries, and
turn the table access into a random sector read instead of sequential
without actually limiting what gets returned]) match that with as many
accounts as you can and return a row for all of the messages (note the
LEFT JOIN). Next filter all of the results on the account email (which
only eliminates 1100 messages out of 52000). Now count how many
messages are left which should return 51419.

The second query is saying get all of the accounts filter by email
address (it can get this from the where this time) giving 1 row. Now
match that to every message for this account_id and return at least one
row even if there are no messages for this account (note again the LEFT
JOIN) (which uses the index scan because it expects the index
selectivity to be a approximately 1/4 of the full table [it's wrong]).
Now count how many messages I have which returns 51419.

The third query is saying give me all of the messages for the accounts
where my email = 's******@neosynapse.net' and I don't care where you
start from. The optimizer, after going through consideration of
various possible plans, is then smart enough to realize the email =
'blah' is indexed and it's selectivity is 1 row which means that we now
return to the situation in query 2 with one small change if there are no
messages for the account in question you would get no row returned,
leading to a more efficient aggregation step.

Steven D.Arnold wrote:

On Dec 21, 2003, at 11:47 PM, Tom Lane wrote:
"Steven D.Arnold" <st*****@neosynapse.net> writes:
Query (2) below is the same query, but we reverse the order of the
tables. It's obviously not quite the same query semantically, even
though in my case it should always produce the same result. You are correct the queries produce the same results, but they are
telling the planner to do completely different things. The query
doesn't show it bu if the behavior you are desiring happened in postgres
(unless show the relational algebra that makes it work), I would have to
start looking for a new database (that's a disturbing thought).

Since it is in fact not the same query, I'm unclear on why you expect
it to produce the same plan.

What I expect is for both queries to use the index on the messages
table! Why is it not doing that?


Because of the table ordering and the left join in 7.3.x
Because of the left join in 7.4
FWIW, I believe that 7.4 will recognize that (1) and (3) are
semantically equivalent.

I will try 7.4 and report back.


I don't believe the optimiser (in any database that cares about giving
you the correct results) can determine that a non-constrained primary
table in a left join can be rewritten as either of your other two
queries (but there are smarter people than me working on Postgres, so I
could be wrong).

steve


My suggestion would be to place the more selective table first in a
JOIN, and get rid of the LEFT JOIN's unless that's exactly what you
want. For more information about the different JOIN methods RTFM.

I would also suggest that you might want to tune your random page cost
toward 1, because obviously random access is being over estimated for
your hardware. (You might just want to look at tuning your parameters
in general.)

And in the future you should run a query at least one extra time to note
the different caching makes (the second run for an explain analyze is
usually quite different than the first for tables of this size).

DeJuan
---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Nov 12 '05 #4

P: n/a
The queries are listed here for the referentially (yes that's a pun)
challenged.

Query 1:
SELECT COUNT(message_id)
FROM messages m
LEFT JOIN accounts a
ON m.account_id::bigint = a.account_id::bigint
WHERE a.email = 's******@neosynapse.net';

Query 2:
SELECT COUNT(message_id)
FROM accounts a
LEFT JOIN messages m
ON a.account_id::bigint = m.account_id::bigint
WHERE a.email = 's******@neosynapse.net';

Query 3:
SELECT COUNT(message_id)
FROM messages m, accounts a
WHERE m.account_id::bigint = a.account_id::bigint
AND a.email = 's******@neosynapse.net';

From what I can see they are not the same query and therefore shouldn't
use the same plan.

The first query is saying go get all the messages (best done with a seq
scan since there is no where to limit the results of the message table
[using an index scan would just add the overhead of reading the pages
for the index, the computational time to resolve the index entries, and
turn the table access into a random sector read instead of sequential
without actually limiting what gets returned]) match that with as many
accounts as you can and return a row for all of the messages (note the
LEFT JOIN). Next filter all of the results on the account email (which
only eliminates 1100 messages out of 52000). Now count how many
messages are left which should return 51419.

The second query is saying get all of the accounts filter by email
address (it can get this from the where this time) giving 1 row. Now
match that to every message for this account_id and return at least one
row even if there are no messages for this account (note again the LEFT
JOIN) (which uses the index scan because it expects the index
selectivity to be a approximately 1/4 of the full table [it's wrong]).
Now count how many messages I have which returns 51419.

The third query is saying give me all of the messages for the accounts
where my email = 's******@neosynapse.net' and I don't care where you
start from. The optimizer, after going through consideration of
various possible plans, is then smart enough to realize the email =
'blah' is indexed and it's selectivity is 1 row which means that we now
return to the situation in query 2 with one small change if there are no
messages for the account in question you would get no row returned,
leading to a more efficient aggregation step.

Steven D.Arnold wrote:

On Dec 21, 2003, at 11:47 PM, Tom Lane wrote:
"Steven D.Arnold" <st*****@neosynapse.net> writes:
Query (2) below is the same query, but we reverse the order of the
tables. It's obviously not quite the same query semantically, even
though in my case it should always produce the same result. You are correct the queries produce the same results, but they are
telling the planner to do completely different things. The query
doesn't show it bu if the behavior you are desiring happened in postgres
(unless show the relational algebra that makes it work), I would have to
start looking for a new database (that's a disturbing thought).

Since it is in fact not the same query, I'm unclear on why you expect
it to produce the same plan.

What I expect is for both queries to use the index on the messages
table! Why is it not doing that?


Because of the table ordering and the left join in 7.3.x
Because of the left join in 7.4
FWIW, I believe that 7.4 will recognize that (1) and (3) are
semantically equivalent.

I will try 7.4 and report back.


I don't believe the optimiser (in any database that cares about giving
you the correct results) can determine that a non-constrained primary
table in a left join can be rewritten as either of your other two
queries (but there are smarter people than me working on Postgres, so I
could be wrong).

steve


My suggestion would be to place the more selective table first in a
JOIN, and get rid of the LEFT JOIN's unless that's exactly what you
want. For more information about the different JOIN methods RTFM.

I would also suggest that you might want to tune your random page cost
toward 1, because obviously random access is being over estimated for
your hardware. (You might just want to look at tuning your parameters
in general.)

And in the future you should run a query at least one extra time to note
the different caching makes (the second run for an explain analyze is
usually quite different than the first for tables of this size).

DeJuan
---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Nov 12 '05 #5

P: n/a
DeJuan Jackson <dj******@speedfc.com> writes:
Query 1:
SELECT COUNT(message_id)
FROM messages m
LEFT JOIN accounts a
ON m.account_id::bigint = a.account_id::bigint
WHERE a.email = 's******@neosynapse.net'; Query 2:
SELECT COUNT(message_id)
FROM accounts a
LEFT JOIN messages m
ON a.account_id::bigint = m.account_id::bigint
WHERE a.email = 's******@neosynapse.net'; Query 3:
SELECT COUNT(message_id)
FROM messages m, accounts a
WHERE m.account_id::bigint = a.account_id::bigint
AND a.email = 's******@neosynapse.net'; From what I can see they are not the same query and therefore shouldn't
use the same plan.


Actually, queries 1 and 3 are equivalent, and I believe PG 7.4 will
recognize them as such. The reason is that the WHERE clause "a.email =
'something'" cannot succeed when a.email is NULL; therefore, there is no
point in the JOIN being a LEFT JOIN --- any null-extended rows added by
the left join will be thrown away again by the WHERE clause. We may as
well reduce the LEFT JOIN to a plain inner JOIN, whereupon query 1 is
obviously the same as query 3. PG 7.4's optimizer can make exactly this
sequence of deductions. The bit of knowledge it needs for this is that
the '=' operator involved is STRICT, ie, yields NULL for NULL input.
All the standard '=' operators are strict and are so marked in the
catalogs. (If you are defining a user-defined type, don't forget to
mark your operators strict where applicable.)

I believe that query 2 is really equivalent to the others as well, but
proving it is more subtle. The reason is that COUNT(message_id) does
not count rows where message_id is NULL, and so any null-extended rows
added by the LEFT JOIN won't be counted, and so we might as well reduce
the LEFT JOIN to a plain inner JOIN. PG's optimizer will not recognize
this, however. Possibly it could if anyone wanted to figure out how.
Right now we make very few assumptions about the behavior of aggregate
functions, but I think you could prove that this is safe based on the
behavior of nodeAgg.c for strict transition functions. Next question
is whether the case would come up often enough to be worth testing
for ...

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to ma*******@postgresql.org

Nov 12 '05 #6

P: n/a
Thanks to all for the detailed replies. I just wanted to let everyone
know -- for future google searches as much as anything else -- that
dumping the database, upgrading to 7.4.1 and reloading did solve the
problem. All the queries I mentioned now use the available indices,
except for understandable cases such as the number of rows in a table
being really small.

Thanks for the tip and thanks for the improvements in 7.4.1 that fixed
this problem.

steve
---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to ma*******@postgresql.org)

Nov 12 '05 #7

This discussion thread is closed

Replies have been disabled for this discussion.