I have a number of very common queries that the optimizer plans a very inefficient plan for. I am using postgres 7.2.3. I vacuum hourly. I'm wonderingwhat I can do to make the queries faster.
Here are the relevant tables:
create table image(
imageid integer not null, /* The image's ID */
containerid integer not null, /* The container that owns it */
name varchar(120) not null, /* Its name */
state bigint not null default 0, /* Its state */
primary key (imageid),
unique (containerid, name) /* All images in a container must be uniquely named */
);
create table ancestry(
containerid integer not null, /* The container that has an ancestor*/
ancestorid integer not null, /* The ancestor of the container */
unique (containerid, ancestorid),
unique (ancestorid, containerid)
);
I have somewhere around 3M rows in the image table, and 37K rows in the ancestry table. The following is representative of some of the common queries I issue:
select * from image natural join ancestry where ancestorid=1000000 and (state & 7::bigint) = 0::bigint;
When I ask postgres to EXPLAIN it, I get the following:
Merge Join (cost=81858.22..81900.60 rows=124 width=49)
-> Sort (cost=81693.15..81693.15 rows=16288 width=41)
-> Seq Scan on image (cost=0.00..80279.17 rows=16288 width=41)
-> Sort (cost=165.06..165.06 rows=45 width=8)
-> Index Scan using ancestry_ancestorid_key on ancestry (cost=0..00..163.83 rows=45 width=8)
It appears to me that the query executes as follows:
1. Scan every row in the image table to find those where (state & 7::bigint) = 0::bigint
2. Sort the results
3. Use an index on ancestry to find rows where ancestorid=1000000
4. Sort the results
5. Join the two
It seems to me that if this query is going to return a small percentage of the rows (which is the common case), it could be done much faster by first joining (all columns involved in the join are indexed), and then by applying the (state & 7::bigint) = 0::bigint constraint to the results. I realize that the query planner is going to have a difficult time estimating the number of rows returned by the bit operator. However, I'd be happy forcing it to always perform the join first, and then apply the state constraint to the results.
Similarly, when I update, I get the following:
explain update image set state=0 from ancestry where ancestorid=1000000and ancestry.containerid=image.containerid and (state & 7::bigint) = 0::bigint;
NOTICE: QUERY PLAN:
Merge Join (cost=81841.92..81884.30 rows=124 width=43)
-> Sort (cost=81676.74..81676.74 rows=16288 width=39)
-> Seq Scan on image (cost=0.00..80279.17 rows=16288 width=39)
-> Sort (cost=165.19..165.19 rows=45 width=4)
-> Index Scan using ancestry_ancestorid_key on ancestry (cost=0..00..163.95 rows=45 width=4)
Is there any way to give the planner a hint, or reword the query and updateso that it executes the way I want?
Thanks in advance.
Robert Wille 1 2687 rw****@iarchives.com ("Robert Wille") wrote in message news:<012c01c340ce$dfc89110$6402a8c0@zucchini>... I have a number of very common queries that the optimizer plans a very inef ficient plan for. I am using postgres 7.2.3. I vacuum hourly. I'm wondering what I can do to make the queries faster.
Here are the relevant tables:
create table image( imageid integer not null, /* The image's ID */ containerid integer not null, /* The container that owns it */ name varchar(120) not null, /* Its name */ state bigint not null default 0, /* Its state */ primary key (imageid), unique (containerid, name) /* All images in a container must be uni quely named */ );
create table ancestry( containerid integer not null, /* The container that has an ancestor */ ancestorid integer not null, /* The ancestor of the container */ unique (containerid, ancestorid), unique (ancestorid, containerid) );
I have somewhere around 3M rows in the image table, and 37K rows in the anc estry table. The following is representative of some of the common queries I issue:
select * from image natural join ancestry where ancestorid=1000000 and (s tate & 7::bigint) = 0::bigint;
When I ask postgres to EXPLAIN it, I get the following:
Merge Join (cost=81858.22..81900.60 rows=124 width=49) -> Sort (cost=81693.15..81693.15 rows=16288 width=41) -> Seq Scan on image (cost=0.00..80279.17 rows=16288 width= 41) -> Sort (cost=165.06..165.06 rows=45 width=8) -> Index Scan using ancestry ancestorid key on ancestry (cost=0 .00..163.83 rows=45 width=8)
It appears to me that the query executes as follows:
1. Scan every row in the image table to find those where (state & 7::bigint ) = 0::bigint 2. Sort the results 3. Use an index on ancestry to find rows where ancestorid=1000000 4. Sort the results 5. Join the two
It seems to me that if this query is going to return a small percentage of the rows (which is the common case), it could be done much faster by first joining (all columns involved in the join are indexed), and then by applyin g the (state & 7::bigint) = 0::bigint constraint to the results. I realiz e that the query planner is going to have a difficult time estimating the n umber of rows returned by the bit operator. However, I'd be happy forcing i t to always perform the join first, and then apply the state constraint to the results.
Similarly, when I update, I get the following:
explain update image set state=0 from ancestry where ancestorid=1000000 and ancestry.containerid=image.containerid and (state & 7::bigint) = 0 ::bigint;
NOTICE: QUERY PLAN:
Merge Join (cost=81841.92..81884.30 rows=124 width=43) -> Sort (cost=81676.74..81676.74 rows=16288 width=39) -> Seq Scan on image (cost=0.00..80279.17 rows=16288 width= 39) -> Sort (cost=165.19..165.19 rows=45 width=4) -> Index Scan using ancestry ancestorid key on ancestry (cost=0 .00..163.95 rows=45 width=4)
Is there any way to give the planner a hint, or reword the query and update so that it executes the way I want?
Thanks in advance.
Robert Wille
--
Try an index on the condition of the join -
CREATE INDEX "stateplus7isnone" on image (containerid) where
(state&7=0);
select * from image i inner join ancestry a using (containerid) where
ancestorid=1000000 and (state&7) = 0;
the index will be used in the query you give - hopefully speeding
access to the data.
--
Tom Hebbron This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics
by: Brian Oster |
last post by:
After applying security patch MS03-031 (Sql server ver 8.00.818) a
query that used to execute in under 2 seconds, now takes over 8
Minutes to...
|
by: Paul Mateer |
last post by:
Hi,
I have been running some queries against a table in a my database and
have noted an odd (at least it seems odd to me) performance issue.
...
|
by: Jerry Brenner |
last post by:
Our users have potentially dirty legacy data that they need to get
into our application. We provide a set of staging tables, which map
to our...
|
by: Bob |
last post by:
Hi there,
Need a little help with a certain query that's causing a lot of acid
in my stomach...
Have a table that stores sales measures for a...
|
by: Robert Wille |
last post by:
I have a number of very common queries that the optimizer plans a very inefficient plan for. I am using postgres 7.2.3. I vacuum hourly. I'm...
|
by: apb18 |
last post by:
A bit of query plan strangeness. Suppose you have an inheritance tree
such that the columns 'ID' and 'field' appear in the top level table,
call...
|
by: Együd Csaba |
last post by:
Hi All,
how can I improve the query performance in the following situation:
I have a big (4.5+ million rows) table. One query takes approx. 9 sec...
|
by: Ryan |
last post by:
I came across a situation that I've been unable to explain and was
hoping somebody had an answer:
I had written an update query which was taking...
|
by: stig |
last post by:
hi.
coming from postgresql, i am used to textual references to most of the
things i do with the database. i feel a little lost with all the...
|
by: Kemmylinns12 |
last post by:
Blockchain technology has emerged as a transformative force in the business world, offering unprecedented opportunities for innovation and...
|
by: Naresh1 |
last post by:
What is WebLogic Admin Training?
WebLogic Admin Training is a specialized program designed to equip individuals with the skills and knowledge...
|
by: antdb |
last post by:
Ⅰ. Advantage of AntDB: hyper-convergence + streaming processing engine
In the overall architecture, a new "hyper-convergence" concept was...
|
by: Arjunsri |
last post by:
I have a Redshift database that I need to use as an import data source. I have configured the DSN connection using the server, port, database, and...
|
by: Matthew3360 |
last post by:
Hi,
I have been trying to connect to a local host using php curl. But I am finding it hard to do this. I am doing the curl get request from my web...
|
by: Oralloy |
last post by:
Hello Folks,
I am trying to hook up a CPU which I designed using SystemC to I/O pins on an FPGA.
My problem (spelled failure) is with the...
|
by: Carina712 |
last post by:
Setting background colors for Excel documents can help to improve the visual appeal of the document and make it easier to read and understand....
|
by: BLUEPANDA |
last post by:
At BluePanda Dev, we're passionate about building high-quality software and sharing our knowledge with the community. That's why we've created a SaaS...
|
by: Rahul1995seven |
last post by:
Introduction:
In the realm of programming languages, Python has emerged as a powerhouse. With its simplicity, versatility, and robustness, Python...
| |