473,324 Members | 2,456 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,324 software developers and data experts.

bug in query planning?

I have a query which does not use column indexes that it should use. I
have discovered some interesting behaviors of Postgres which may
indicate a bug in the database's query planning.

Take a look at the query below. There is a btree index on both
m.account_id and a.account_id. Query (1) does not use the index on the
messages table, instead opting for a full table scan, thus killing
performance. The messages table can contain potentially hundreds of
thousands or millions of rows. Even at 50,000, it's murder.

Query (2) below is the same query, but we reverse the order of the
tables. It's obviously not quite the same query semantically, even
though in my case it should always produce the same result. It is
interesting to note that it uses the indexes tho.

Finally, query (3) below uses traditional joining (non-ANSI). Indexes
are correctly used in that query. The suggestion is that Postgres does
not correctly analyze queries using ANSI joins. Indexes are
occasionally skipped when they should be used. This seems like a bug
in Postgres. I'm using version 7.3.4 of Postgres.

Thanks in advance for any comments...
steve
Query (1)
=========
defender=# explain analyze
defender-# select count(message_id)
defender-# from messages m
defender-# left join accounts a
defender-# on m.account_id::bigint = a.account_id::bigint
defender-# where a.email = 's******@neosynapse.net';
QUERY
PLAN
------------------------------------------------------------------------
--------------------------------------------------------------------
Aggregate (cost=20461.10..20461.10 rows=1 width=47) (actual
time=1420.09..1420.09 rows=1 loops=1)
-> Hash Join (cost=30.77..20334.38 rows=50687 width=47) (actual
time=0.51..1319.69 rows=51419 loops=1)
Hash Cond: ("outer".account_id = "inner".account_id)
Filter: ("inner".email = 's******@neosynapse.net'::text)
-> Seq Scan on messages m (cost=0.00..19289.87 rows=50687
width=16) (actual time=0.06..703.89 rows=52541 loops=1)
-> Hash (cost=30.76..30.76 rows=3 width=31) (actual
time=0.40..0.40 rows=0 loops=1)
-> Index Scan using accounts_pkey on accounts a
(cost=0.00..30.76 rows=3 width=31) (actual time=0.17..0.38 rows=3
loops=1)
Total runtime: 1420.25 msec
(8 rows)
Query (2)
=========
defender=# explain analyze
defender-# select count(message_id)
defender-# from accounts a
defender-# left join messages m
defender-# on a.account_id::bigint = m.account_id::bigint
defender-# where a.email = 's******@neosynapse.net';

QUERY PLAN
------------------------------------------------------------------------
------------------------------------------------------------------------
--------
Aggregate (cost=6806.54..6806.54 rows=1 width=24) (actual
time=792.14..792.14 rows=1 loops=1)
-> Nested Loop (cost=0.00..6764.30 rows=16896 width=24) (actual
time=0.38..718.12 rows=51419 loops=1)
-> Index Scan using accounts_email on accounts a
(cost=0.00..8.98 rows=1 width=8) (actual time=0.22..0.25 rows=1
loops=1)
Index Cond: (email = 's******@neosynapse.net'::text)
-> Index Scan using messages_account_id on messages m
(cost=0.00..6544.13 rows=16896 width=16) (actual time=0.15..593.15
rows=51419 loops=1)
Index Cond: ("outer".account_id = m.account_id)
Total runtime: 792.33 msec
(7 rows)
Query (3)
=========
defender=# explain analyze
defender-# select count(message_id)
defender-# from messages m, accounts a
defender-# where m.account_id::bigint = a.account_id::bigint
defender-# and a.email = 's******@neosynapse.net';

QUERY PLAN
------------------------------------------------------------------------
------------------------------------------------------------------------
--------
Aggregate (cost=6806.54..6806.54 rows=1 width=24) (actual
time=782.30..782.30 rows=1 loops=1)
-> Nested Loop (cost=0.00..6764.30 rows=16896 width=24) (actual
time=0.33..708.52 rows=51422 loops=1)
-> Index Scan using accounts_email on accounts a
(cost=0.00..8.98 rows=1 width=8) (actual time=0.15..0.18 rows=1
loops=1)
Index Cond: (email = 's******@neosynapse.net'::text)
-> Index Scan using messages_account_id on messages m
(cost=0.00..6544.13 rows=16896 width=16) (actual time=0.15..578.23
rows=51422 loops=1)
Index Cond: (m.account_id = "outer".account_id)
Total runtime: 782.46 msec
(7 rows)
---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Nov 12 '05 #1
6 1872
"Steven D.Arnold" <st*****@neosynapse.net> writes:
Query (2) below is the same query, but we reverse the order of the
tables. It's obviously not quite the same query semantically, even
though in my case it should always produce the same result.
Since it is in fact not the same query, I'm unclear on why you expect
it to produce the same plan.
I'm using version 7.3.4 of Postgres.


FWIW, I believe that 7.4 will recognize that (1) and (3) are
semantically equivalent.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to ma*******@postgresql.org

Nov 12 '05 #2

On Dec 21, 2003, at 11:47 PM, Tom Lane wrote:
"Steven D.Arnold" <st*****@neosynapse.net> writes:
Query (2) below is the same query, but we reverse the order of the
tables. It's obviously not quite the same query semantically, even
though in my case it should always produce the same result.
Since it is in fact not the same query, I'm unclear on why you expect
it to produce the same plan.


What I expect is for both queries to use the index on the messages
table! Why is it not doing that?
FWIW, I believe that 7.4 will recognize that (1) and (3) are
semantically equivalent.


I will try 7.4 and report back.

steve
---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ma*******@postgresql.org so that your
message can get through to the mailing list cleanly

Nov 12 '05 #3
The queries are listed here for the referentially (yes that's a pun)
challenged.

Query 1:
SELECT COUNT(message_id)
FROM messages m
LEFT JOIN accounts a
ON m.account_id::bigint = a.account_id::bigint
WHERE a.email = 's******@neosynapse.net';

Query 2:
SELECT COUNT(message_id)
FROM accounts a
LEFT JOIN messages m
ON a.account_id::bigint = m.account_id::bigint
WHERE a.email = 's******@neosynapse.net';

Query 3:
SELECT COUNT(message_id)
FROM messages m, accounts a
WHERE m.account_id::bigint = a.account_id::bigint
AND a.email = 's******@neosynapse.net';

From what I can see they are not the same query and therefore shouldn't
use the same plan.

The first query is saying go get all the messages (best done with a seq
scan since there is no where to limit the results of the message table
[using an index scan would just add the overhead of reading the pages
for the index, the computational time to resolve the index entries, and
turn the table access into a random sector read instead of sequential
without actually limiting what gets returned]) match that with as many
accounts as you can and return a row for all of the messages (note the
LEFT JOIN). Next filter all of the results on the account email (which
only eliminates 1100 messages out of 52000). Now count how many
messages are left which should return 51419.

The second query is saying get all of the accounts filter by email
address (it can get this from the where this time) giving 1 row. Now
match that to every message for this account_id and return at least one
row even if there are no messages for this account (note again the LEFT
JOIN) (which uses the index scan because it expects the index
selectivity to be a approximately 1/4 of the full table [it's wrong]).
Now count how many messages I have which returns 51419.

The third query is saying give me all of the messages for the accounts
where my email = 's******@neosynapse.net' and I don't care where you
start from. The optimizer, after going through consideration of
various possible plans, is then smart enough to realize the email =
'blah' is indexed and it's selectivity is 1 row which means that we now
return to the situation in query 2 with one small change if there are no
messages for the account in question you would get no row returned,
leading to a more efficient aggregation step.

Steven D.Arnold wrote:

On Dec 21, 2003, at 11:47 PM, Tom Lane wrote:
"Steven D.Arnold" <st*****@neosynapse.net> writes:
Query (2) below is the same query, but we reverse the order of the
tables. It's obviously not quite the same query semantically, even
though in my case it should always produce the same result. You are correct the queries produce the same results, but they are
telling the planner to do completely different things. The query
doesn't show it bu if the behavior you are desiring happened in postgres
(unless show the relational algebra that makes it work), I would have to
start looking for a new database (that's a disturbing thought).

Since it is in fact not the same query, I'm unclear on why you expect
it to produce the same plan.

What I expect is for both queries to use the index on the messages
table! Why is it not doing that?


Because of the table ordering and the left join in 7.3.x
Because of the left join in 7.4
FWIW, I believe that 7.4 will recognize that (1) and (3) are
semantically equivalent.

I will try 7.4 and report back.


I don't believe the optimiser (in any database that cares about giving
you the correct results) can determine that a non-constrained primary
table in a left join can be rewritten as either of your other two
queries (but there are smarter people than me working on Postgres, so I
could be wrong).

steve


My suggestion would be to place the more selective table first in a
JOIN, and get rid of the LEFT JOIN's unless that's exactly what you
want. For more information about the different JOIN methods RTFM.

I would also suggest that you might want to tune your random page cost
toward 1, because obviously random access is being over estimated for
your hardware. (You might just want to look at tuning your parameters
in general.)

And in the future you should run a query at least one extra time to note
the different caching makes (the second run for an explain analyze is
usually quite different than the first for tables of this size).

DeJuan
---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Nov 12 '05 #4
The queries are listed here for the referentially (yes that's a pun)
challenged.

Query 1:
SELECT COUNT(message_id)
FROM messages m
LEFT JOIN accounts a
ON m.account_id::bigint = a.account_id::bigint
WHERE a.email = 's******@neosynapse.net';

Query 2:
SELECT COUNT(message_id)
FROM accounts a
LEFT JOIN messages m
ON a.account_id::bigint = m.account_id::bigint
WHERE a.email = 's******@neosynapse.net';

Query 3:
SELECT COUNT(message_id)
FROM messages m, accounts a
WHERE m.account_id::bigint = a.account_id::bigint
AND a.email = 's******@neosynapse.net';

From what I can see they are not the same query and therefore shouldn't
use the same plan.

The first query is saying go get all the messages (best done with a seq
scan since there is no where to limit the results of the message table
[using an index scan would just add the overhead of reading the pages
for the index, the computational time to resolve the index entries, and
turn the table access into a random sector read instead of sequential
without actually limiting what gets returned]) match that with as many
accounts as you can and return a row for all of the messages (note the
LEFT JOIN). Next filter all of the results on the account email (which
only eliminates 1100 messages out of 52000). Now count how many
messages are left which should return 51419.

The second query is saying get all of the accounts filter by email
address (it can get this from the where this time) giving 1 row. Now
match that to every message for this account_id and return at least one
row even if there are no messages for this account (note again the LEFT
JOIN) (which uses the index scan because it expects the index
selectivity to be a approximately 1/4 of the full table [it's wrong]).
Now count how many messages I have which returns 51419.

The third query is saying give me all of the messages for the accounts
where my email = 's******@neosynapse.net' and I don't care where you
start from. The optimizer, after going through consideration of
various possible plans, is then smart enough to realize the email =
'blah' is indexed and it's selectivity is 1 row which means that we now
return to the situation in query 2 with one small change if there are no
messages for the account in question you would get no row returned,
leading to a more efficient aggregation step.

Steven D.Arnold wrote:

On Dec 21, 2003, at 11:47 PM, Tom Lane wrote:
"Steven D.Arnold" <st*****@neosynapse.net> writes:
Query (2) below is the same query, but we reverse the order of the
tables. It's obviously not quite the same query semantically, even
though in my case it should always produce the same result. You are correct the queries produce the same results, but they are
telling the planner to do completely different things. The query
doesn't show it bu if the behavior you are desiring happened in postgres
(unless show the relational algebra that makes it work), I would have to
start looking for a new database (that's a disturbing thought).

Since it is in fact not the same query, I'm unclear on why you expect
it to produce the same plan.

What I expect is for both queries to use the index on the messages
table! Why is it not doing that?


Because of the table ordering and the left join in 7.3.x
Because of the left join in 7.4
FWIW, I believe that 7.4 will recognize that (1) and (3) are
semantically equivalent.

I will try 7.4 and report back.


I don't believe the optimiser (in any database that cares about giving
you the correct results) can determine that a non-constrained primary
table in a left join can be rewritten as either of your other two
queries (but there are smarter people than me working on Postgres, so I
could be wrong).

steve


My suggestion would be to place the more selective table first in a
JOIN, and get rid of the LEFT JOIN's unless that's exactly what you
want. For more information about the different JOIN methods RTFM.

I would also suggest that you might want to tune your random page cost
toward 1, because obviously random access is being over estimated for
your hardware. (You might just want to look at tuning your parameters
in general.)

And in the future you should run a query at least one extra time to note
the different caching makes (the second run for an explain analyze is
usually quite different than the first for tables of this size).

DeJuan
---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Nov 12 '05 #5
DeJuan Jackson <dj******@speedfc.com> writes:
Query 1:
SELECT COUNT(message_id)
FROM messages m
LEFT JOIN accounts a
ON m.account_id::bigint = a.account_id::bigint
WHERE a.email = 's******@neosynapse.net'; Query 2:
SELECT COUNT(message_id)
FROM accounts a
LEFT JOIN messages m
ON a.account_id::bigint = m.account_id::bigint
WHERE a.email = 's******@neosynapse.net'; Query 3:
SELECT COUNT(message_id)
FROM messages m, accounts a
WHERE m.account_id::bigint = a.account_id::bigint
AND a.email = 's******@neosynapse.net'; From what I can see they are not the same query and therefore shouldn't
use the same plan.


Actually, queries 1 and 3 are equivalent, and I believe PG 7.4 will
recognize them as such. The reason is that the WHERE clause "a.email =
'something'" cannot succeed when a.email is NULL; therefore, there is no
point in the JOIN being a LEFT JOIN --- any null-extended rows added by
the left join will be thrown away again by the WHERE clause. We may as
well reduce the LEFT JOIN to a plain inner JOIN, whereupon query 1 is
obviously the same as query 3. PG 7.4's optimizer can make exactly this
sequence of deductions. The bit of knowledge it needs for this is that
the '=' operator involved is STRICT, ie, yields NULL for NULL input.
All the standard '=' operators are strict and are so marked in the
catalogs. (If you are defining a user-defined type, don't forget to
mark your operators strict where applicable.)

I believe that query 2 is really equivalent to the others as well, but
proving it is more subtle. The reason is that COUNT(message_id) does
not count rows where message_id is NULL, and so any null-extended rows
added by the LEFT JOIN won't be counted, and so we might as well reduce
the LEFT JOIN to a plain inner JOIN. PG's optimizer will not recognize
this, however. Possibly it could if anyone wanted to figure out how.
Right now we make very few assumptions about the behavior of aggregate
functions, but I think you could prove that this is safe based on the
behavior of nodeAgg.c for strict transition functions. Next question
is whether the case would come up often enough to be worth testing
for ...

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to ma*******@postgresql.org

Nov 12 '05 #6
Thanks to all for the detailed replies. I just wanted to let everyone
know -- for future google searches as much as anything else -- that
dumping the database, upgrading to 7.4.1 and reloading did solve the
problem. All the queries I mentioned now use the available indices,
except for understandable cases such as the number of rows in a table
being really small.

Thanks for the tip and thanks for the improvements in 7.4.1 that fixed
this problem.

steve
---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to ma*******@postgresql.org)

Nov 12 '05 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
by: Dave | last post by:
Hi all, I have a problem with a query. (well, I actually have a few problems...) Here is my query. select FOCUS.SiteName, FOCUS.URL, OTWAdCars.* , REGION.SiteName as RegionSite, REGION.URL...
9
by: (Pete Cresswell) | last post by:
I've got some SQL that works as far as returning a recordset from a series of UNION statements. viz: SELECT whatever UNION this UNION that UNION other
3
by: Myron | last post by:
I'm trying to create a query that will tell me which requests took longer than 10 days to move one from particular state to another state. The query I've created returns the correct requests, but...
2
by: Pete | last post by:
Before I get started with the question, does anyone have a (single) good book recommendation for database design? Not an Access-specific book, but something geared toward helping me figure out...
2
by: Robert | last post by:
when using the following function to create a pass through query is there a way to set the query property, "Returns Rows" to no. The default is yes. Since we are planning to create the pass...
2
by: Zygo Blaxell | last post by:
I have a table with a few million rows of temperature data keyed by timestamp. I want to group these rows by timestamp intervals (e.g. every 32 seconds), compute aggregate functions on the...
18
by: simonmarkjones | last post by:
Hi all, I create a report to act as a receipt to customers. The report displays all the customer payment details and then i print this. This works fine. However, i now want to add some more...
1
by: keri | last post by:
I would like to have a combo box on a form that shows the results of a query, however the query is variable and i am unsure how to do this. I have NO knowledge of code so very basc instructions...
1
by: desai.rohit27 | last post by:
Hi. i have one table which contains employee name, date of joining, payment date and salary for the month. the employee name and date of joining together form the primary key. i need to create a...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
1
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
1
by: Shællîpôpï 09 | last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.