469,291 Members | 1,745 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,291 developers. It's quick & easy.

performance of IN (subquery)

I'm using PG 7.4.3 on Mac OS X.

I am disappointed with the performance of queries like 'select foo from
bar where baz in (subquery)', or updates like 'update bar set foo = 2
where baz in (subquery)'. PG always seems to want to do a sequential
scan of the bar table. I wish there were a way of telling PG, "use the
index on baz in your plan, because I know that the subquery will return
very few results". Where it really matters, I have been constructing
dynamic queries by looping over the values for baz and building a
separate query for each one and combining with a UNION (or just
directly updating, in the update case). Depending on the size of the
bar table, I can get speedups of hundreds or even more than a thousand
times, but it is a big pain to have to do this.

Any tips?

Thanks,
Kevin Murphy

Illustrated:

The query I want to do is very slow:

select bundle_id from build.elements
where elementid in (
SELECT superlocs_2.element_id
FROM superlocs_2 NATURAL JOIN bundle_superlocs_2
WHERE bundle_superlocs_2.protobundle_id = 1);
-----------
7644
7644
(2 rows)
Time: 518.242 ms
The subquery is fast:

SELECT superlocs_2.element_id
FROM superlocs_2 NATURAL JOIN bundle_superlocs_2
WHERE bundle_superlocs_2.protobundle_id = 1;
------------
41209
25047
(2 rows)
Time: 3.268 ms
And using indexes on the main table is fast:

select bundle_id from build.elements
where elementid in (41209, 25047);
-----------
7644
7644
(2 rows)
Time: 2.468 ms

The plan for the slow query:

egenome_test=# explain analyze select bundle_id from build.elements
where elementid in (
SELECT superlocs_2.element_id
FROM superlocs_2 NATURAL JOIN bundle_superlocs_2
WHERE bundle_superlocs_2.protobundle_id = 1);
egenome_test-# egenome_test(# egenome_test(# egenome_test(#
QUERY PLAN
\

------------------------------------------------------------------------
-------------------------------------------------------------
Hash Join (cost=70.33..72.86 rows=25 width=4) (actual
time=583.051..583.059 rows=2 loops=1)
Hash Cond: ("outer".element_id = "inner".elementid)
-> HashAggregate (cost=47.83..47.83 rows=25 width=4) (actual
time=0.656..0.658 rows=2 loops=1)
-> Hash Join (cost=22.51..47.76 rows=25 width=4) (actual
time=0.615..0.625 rows=2 loops=1)
Hash Cond: ("outer".superloc_id = "inner".superloc_id)
-> Seq Scan on superlocs_2 (cost=0.00..20.00 rows=1000
width=8) (actual time=0.004..0.012 rows=9 loops=1)
-> Hash (cost=22.50..22.50 rows=5 width=4) (actual
time=0.076..0.076 rows=0 loops=1)
-> Seq Scan on bundle_superlocs_2
(cost=0.00..22.50 rows=5 width=4) (actual time=0.024..0.033 rows=2
loops=1)
Filter: (protobundle_id = 1)
-> Hash (cost=20.00..20.00 rows=1000 width=8) (actual
time=581.802..581.802 rows=0 loops=1)
-> Seq Scan on elements (cost=0.00..20.00 rows=1000 width=8)
(actual time=0.172..405.243 rows=185535 loops=1)
Total runtime: 593.843 ms
(12 rows)
---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Nov 23 '05 #1
22 3043
On Thu, 26 Aug 2004, Kevin Murphy wrote:
I'm using PG 7.4.3 on Mac OS X.

I am disappointed with the performance of queries like 'select foo from bar
where baz in (subquery)', or updates like 'update bar set foo = 2 where baz
in (subquery)'. PG always seems to want to do a sequential scan of the bar
table. I wish there were a way of telling PG, "use the index on baz in your
plan, because I know that the subquery will return very few results". Where
it really matters, I have been constructing dynamic queries by looping over
the values for baz and building a separate query for each one and combining
with a UNION (or just directly updating, in the update case). Depending on
the size of the bar table, I can get speedups of hundreds or even more than a
thousand times, but it is a big pain to have to do this.

Any tips?

Thanks,
Kevin Murphy

Illustrated:

The query I want to do is very slow:

select bundle_id from build.elements
where elementid in (
SELECT superlocs_2.element_id
FROM superlocs_2 NATURAL JOIN bundle_superlocs_2
WHERE bundle_superlocs_2.protobundle_id = 1);
-----------
7644
7644
(2 rows)
Time: 518.242 ms


what field type is protobundle_id? if you typecast the '1' to be the
same, does the index get used?

Email: sc*****@hub.org Yahoo!: yscrappy ICQ: 7615664

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ma*******@postgresql.org so that your
message can get through to the mailing list cleanly

Nov 23 '05 #2
Kevin Murphy wrote:
------------------------------------------------------------------------
-------------------------------------------------------------
Hash Join (cost=70.33..72.86 rows=25 width=4) (actual
time=583.051..583.059 rows=2 loops=1)
Hash Cond: ("outer".element_id = "inner".elementid)
-> HashAggregate (cost=47.83..47.83 rows=25 width=4) (actual
time=0.656..0.658 rows=2 loops=1)
-> Hash Join (cost=22.51..47.76 rows=25 width=4) (actual
time=0.615..0.625 rows=2 loops=1)
Hash Cond: ("outer".superloc_id = "inner".superloc_id)
-> Seq Scan on superlocs_2 (cost=0.00..20.00 rows=1000 width=8)
(actual time=0.004..0.012 rows=9 loops=1)
-> Hash (cost=22.50..22.50 rows=5 width=4) (actual time=0.076..0.076
rows=0 loops=1)
-> Seq Scan on bundle_superlocs_2 (cost=0.00..22.50 rows=5 width=4)
(actual time=0.024..0.033 rows=2 loops=1)
Filter: (protobundle_id = 1)
-> Hash (cost=20.00..20.00 rows=1000 width=8) (actual
time=581.802..581.802 rows=0 loops=1)
-> Seq Scan on elements (cost=0.00..20.00 rows=1000 width=8) (actual
time=0.172..405.243 rows=185535 loops=1)
The planner thinks that the sequential scan on elements will return 1000
rows, but it actually returned 185000. Did you ANALYZE this table recently?

Afterthought: It would be nice if the database was smart enough to
analyze a table of its own accord when a sequential scan returns more
than, say, 20 times what it was supposed to.

Paul
Total runtime: 593.843 ms
(12 rows)
---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Nov 23 '05 #3
> Afterthought: It would be nice if the database was smart enough to
analyze a table of its own accord when a sequential scan returns more
than, say, 20 times what it was supposed to.


I've wondered on several occasions if there is any good reason for PG not
to automatically perform an analyze concurrently with a seq scan as it's
happening. That way, no extra disk IO is needed and the stats could say
up-to-date for almost free.

Any hackers around who can say why this might be a bad idea, or is it one
of those things that just needs a volunteer? (I'm not; at least not now.)

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Nov 23 '05 #4

"Arthur Ward" <aw**************@dominionsciences.com> writes:
Any hackers around who can say why this might be a bad idea, or is it one
of those things that just needs a volunteer? (I'm not; at least not now.)


a) that would make plans change spontaneously. I hate being paged in the
middle of the night because some query is suddenly being slow when it had been
performing fine before.

b) Not all sequential scans will actually complete the scan. There could be a
limit imposed or a the sequential scan could be inside a EXISTS. In that case
the scan could be aborted at any point.

What I do think would be easy to do would be to keep statistics on the expense
of various components of the cost estimates. cpu_*_cost, random_page_cost
effective_cache_size, ought to be values that can be solved for empirically
from the timing results.

But that still doesn't have to be done on every query. There's a trade-off
between work done on every query to plan queries and the benefit. Gathering
statistics and storing them on every sequential scan is way too much work
slowing down every query for minimal gain.

--
greg
---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Nov 23 '05 #5
Paul Tillotson <pn***@shentel.net> writes:
The planner thinks that the sequential scan on elements will return 1000
rows, but it actually returned 185000. Did you ANALYZE this table recently?
Or either of the other ones? All those scan costs look like defaults
:-(
Afterthought: It would be nice if the database was smart enough to
analyze a table of its own accord when a sequential scan returns more
than, say, 20 times what it was supposed to.


I've thought about this before. One simple trick would be to get rid
of the current pg_class reltuples/relpages fields in favor of a
tuples-per-page estimate, which could be multiplied by
RelationGetNumberOfBlocks() during planning. In the absence of any
ANALYZE data the tuples-per-page estimate might be pretty bogus, but
it couldn't be off by more than an order of magnitude or so either way.
And in any case we'd have a guaranteed up-to-date number of blocks.

The objections that could be raised to this are (AFAICS) two:

1. Adding at least an lseek() kernel call per table, and per index, to
every planning operation. I'm not sure this would be significant,
but I'm not sure it wouldn't be, either.

2. Instability of plans. Right now, the planner will not change plans
underneath you --- you have to issue an explicit VACUUM or ANALYZE
to change the terms of discussion. That would stop being true if
physical file size were always taken into account. Maybe this is a
problem, or maybe it isn't ... as someone who likes to be able to
debug planner behavior without actually creating umpteen-gig test
tables, my world view may be a bit skewed ...

It's certainly doable if we decide the pluses outweigh the minuses.
Thoughts?

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to ma*******@postgresql.org

Nov 23 '05 #6

Tom Lane <tg*@sss.pgh.pa.us> writes:
I've thought about this before. One simple trick would be to get rid of the
current pg_class reltuples/relpages fields in favor of a tuples-per-page
estimate, which could be multiplied by RelationGetNumberOfBlocks() during
planning.
This would do something interesting to one of the problem cases I have now. I
have trouble testing a particular batch job that generates a large amount of
precalculated denormalized data.

That's because the data is empty when the job starts, so the plpgsql function
that handles the job caches plans based on an empty table. But as the job
proceeds the data grows and I'm afraid the cached plan may start performing
poorly.

In order to test it I need to run it once, analyze, then reload the function,
truncate the data and run it another time. And hope I generated good
representative test data.

I'm thinking of changing to a non-plpgsql implementation. But that's only half
the issue. I'm not about to run analyze in the middle of the data generation
(which wouldn't work anyways since it's in a transaction). So I can't really
get good statistics for this job, not until we're actually in a steady state
in production.

Sometimes I wonder whether I wouldn't rather a more predictable system with
less focus on statistics. The resulting plans would be more predictable and
predictability is a good thing in production systems...
In the absence of any ANALYZE data the tuples-per-page estimate might be
pretty bogus, but it couldn't be off by more than an order of magnitude or
so either way.
I don't see why it couldn't. If you have a table badly in need of vacuuming
(or had one at the time of the last analyze) it could be off by way more than
an order of magnitude.

For that matter, a table that had undergone many deletes and then been
vacuumed would not change length for a long time afterward even as many new
inserts are performed. Until the table is analyzed the estimated table size
could be off by an arbitrary factor.
The objections that could be raised to this are (AFAICS) two:

1. Adding at least an lseek() kernel call per table, and per index, to
every planning operation. I'm not sure this would be significant,
but I'm not sure it wouldn't be, either.
That seems like something that could be addressed with enough time. A single
value per table, couldn't it be stored in shared memory and only updated
whenever the heap is extended or truncated? Even if you have thousands of
tables that would only be a few kilobytes of shared memory.
2. Instability of plans. Right now, the planner will not change plans
underneath you --- you have to issue an explicit VACUUM or ANALYZE
to change the terms of discussion. That would stop being true if
physical file size were always taken into account. Maybe this is a
problem, or maybe it isn't ... as someone who likes to be able to
debug planner behavior without actually creating umpteen-gig test
tables, my world view may be a bit skewed ...


This is what I'm afraid of. As a OLTP application programmer -- web sites, I
admit -- I care a lot more about plan stability than finding optimal plans.

An well written OLTP application will only have a fixed number of queries that
are executed repeatedly with different parameters. I don't care how long the
queries take as long as they're always "fast enough".

Every time a new plan is tried there's a chance that it will be wrong. It only
takes one wrong plan out of the hundreds of queries to bring down the entire
application.

Ideally I would want a guarantee that every query would *always* result in the
same plan. Once I've tested them and approved the plans I want to know that
only those approved plans will ever run, and I want to be present and be able
to verify new plans before they go into production.

I doubt I'm going to convince anyone today, but I think there will be a
gradual change in mindset as the new binary protocol becomes more popular. And
postgres takes over some of mysql's web mindshare.

--
greg
---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Nov 23 '05 #7
Tom Lane wrote:

2. Instability of plans. Right now, the planner will not change plans
underneath you --- you have to issue an explicit VACUUM or ANALYZE
to change the terms of discussion. That would stop being true if
physical file size were always taken into account. Maybe this is a
problem, or maybe it isn't ... as someone who likes to be able to
debug planner behavior without actually creating umpteen-gig test
tables, my world view may be a bit skewed ...


Did you forgot the autovacuum daemon ? I didn't see anyone bitten or
paged during the night for autovacuum daemon job.
Regards
Gaetano Mendola
Nov 23 '05 #8
Tom Lane wrote:
I've thought about this before. One simple trick would be to get rid
of the current pg_class reltuples/relpages fields in favor of a
tuples-per-page estimate, which could be multiplied by
RelationGetNumberOfBlocks() during planning. In the absence of any
ANALYZE data the tuples-per-page estimate might be pretty bogus, but
it couldn't be off by more than an order of magnitude or so either way.
And in any case we'd have a guaranteed up-to-date number of blocks.

The objections that could be raised to this are (AFAICS) two:
[snip]
2. Instability of plans. Right now, the planner will not change plans
underneath you --- you have to issue an explicit VACUUM or ANALYZE
to change the terms of discussion. That would stop being true if
physical file size were always taken into account. Maybe this is a
problem, or maybe it isn't ... as someone who likes to be able to
debug planner behavior without actually creating umpteen-gig test
tables, my world view may be a bit skewed ...

It's certainly doable if we decide the pluses outweigh the minuses.
Thoughts?


My first reaction is to wonder if this would give performance exactly
equal to running a true ANALYZE in every situation? If not, then you
would end up with an automated pseudo-ANALYZE (performance-wise).

In my opinion, it is almost a feature that non-ANALYZE-d tables give
such horrendous performance, it kicks you in the butt to do some
thinking about when to correctly deal with ANALYZEing.

So, in short, I think it is a huge win if we could have automatic
ANALYZE with true ANALYZE performance, but a huge loss if the automatic
ANALYZE performance is not exactly as good as a true ANALYZE.

--
-**-*-*---*-*---*-*---*-----*-*-----*---*-*---*-----*-----*-*-----*---
Jon Lapham <la****@jandr.org> Rio de Janeiro, Brasil
Personal: http://www.jandr.org/
***-*--*----*-------*------------*--------------------*---------------
---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Nov 23 '05 #9
Thanks all for the reminders about analyzing, and I apologize for wasting
everyone's time. The main table did indeed need to be analyzed (I had
truncated it and repopulated it with "insert ... select" but forgotten to
analyze). The other tables are very small temporary tables, and I assumed,
whether correctly or not, that analyzing would not be helpful for them.

All this is happening in the context of an algorithm in a PL/PGSQL function.
The three temporary tables are reused thousands of times. I wasn't sure if
it would be better to truncate them between uses to keep them small or just
allow them to grow. Unfortunately for the Tree of Knowledge, performance is
now more than adequate, so I may not do this experiment.

Thanks,
Kevin Murphy

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to ma*******@postgresql.org)

Nov 23 '05 #10
2. Instability of plans. Right now, the planner will not change plans
underneath you --- you have to issue an explicit VACUUM or ANALYZE
to change the terms of discussion. That would stop being true if
physical file size were always taken into account. Maybe this is a
problem, or maybe it isn't ... as someone who likes to be able to
debug planner behavior without actually creating umpteen-gig test
tables, my world view may be a bit skewed ...

It's certainly doable if we decide the pluses outweigh the minuses.
Thoughts?

Even though Ijust proposed it, now I think that the idea of having
postgres automatically gather statistics only when an estimate is
grossly wrong is not a good one. This will just obscure the need to run
analyze, and many people will then always be using moderately bad
statistics without realizing it.

(I think I used postgres in a production environment for about six
months without even knowing about the ANALYZE command--with small data
sets it's not obvious--and it was still blazingly fast compared to foxpro.)

If pgautovacuum does ANALYZE and we can get most people to use
pgautovacuum, I think this problem will go away.

Another possibility is simply to make postgres generate a warning, saying:

WARNING: statistics for table xxxxxx are off by a factor of y. You
should ANALYZE this table.

Paul

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ma*******@postgresql.org so that your
message can get through to the mailing list cleanly

Nov 23 '05 #11
Jon Lapham <la****@jandr.org> writes:
Tom Lane wrote:
I've thought about this before. One simple trick would be to get rid
of the current pg_class reltuples/relpages fields in favor of a
tuples-per-page estimate, which could be multiplied by
RelationGetNumberOfBlocks() during planning.
My first reaction is to wonder if this would give performance exactly
equal to running a true ANALYZE in every situation?


No, this is quite orthogonal to ANALYZE (and several orders of magnitude
cheaper...) It would fix a lot of the simpler cases, though, since the
first-order effects like table size would be right.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Nov 23 '05 #12
Greg Stark <gs*****@mit.edu> writes:
I'm not about to run analyze in the middle of the data generation
(which wouldn't work anyways since it's in a transaction).
Since 7.3 or 7.4, you *can* run ANALYZE in the middle of a transaction.
The cached-plan business is a problem, I agree, but I think it's
orthogonal to this particular discussion (and you can always use EXECUTE
if you have to).
In the absence of any ANALYZE data the tuples-per-page estimate might be
pretty bogus, but it couldn't be off by more than an order of magnitude or
so either way.

I don't see why it couldn't. If you have a table badly in need of vacuuming
(or had one at the time of the last analyze) it could be off by way more than
an order of magnitude.
Well, I was actually thinking of the physical tuples-per-page stat
(perhaps better expressed as an average tuple size), but you are right
that the fraction of dead tuples is also something to think about.
We don't model that explicitly ATM but maybe we should. The original
VACUUM-based stats code couldn't really do much with it, since VACUUM
would leave no dead tuples behind in the first place; but separate
ANALYZE could definitely make an estimate of the fraction of dead tuples.
Ideally I would want a guarantee that every query would *always*
result in the same plan. Once I've tested them and approved the plans
I want to know that only those approved plans will ever run, and I
want to be present and be able to verify new plans before they go into
production. I doubt I'm going to convince anyone today,


Nope, you aren't. The above seems to me to be a recipe for degradation
of performance over time, precisely because the plans wouldn't change in
the face of changes in the situation. I've resisted adding "planner
hints" to the language for this reason, and I'm certainly not eager to
offer any hard guarantees about plans not changing.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to ma*******@postgresql.org)

Nov 23 '05 #13

Is there anything for the TODO here?

---------------------------------------------------------------------------

Tom Lane wrote:
Greg Stark <gs*****@mit.edu> writes:
I'm not about to run analyze in the middle of the data generation
(which wouldn't work anyways since it's in a transaction).


Since 7.3 or 7.4, you *can* run ANALYZE in the middle of a transaction.
The cached-plan business is a problem, I agree, but I think it's
orthogonal to this particular discussion (and you can always use EXECUTE
if you have to).
In the absence of any ANALYZE data the tuples-per-page estimate might be
pretty bogus, but it couldn't be off by more than an order of magnitude or
so either way.

I don't see why it couldn't. If you have a table badly in need of vacuuming
(or had one at the time of the last analyze) it could be off by way more than
an order of magnitude.


Well, I was actually thinking of the physical tuples-per-page stat
(perhaps better expressed as an average tuple size), but you are right
that the fraction of dead tuples is also something to think about.
We don't model that explicitly ATM but maybe we should. The original
VACUUM-based stats code couldn't really do much with it, since VACUUM
would leave no dead tuples behind in the first place; but separate
ANALYZE could definitely make an estimate of the fraction of dead tuples.
Ideally I would want a guarantee that every query would *always*
result in the same plan. Once I've tested them and approved the plans
I want to know that only those approved plans will ever run, and I
want to be present and be able to verify new plans before they go into
production.

I doubt I'm going to convince anyone today,


Nope, you aren't. The above seems to me to be a recipe for degradation
of performance over time, precisely because the plans wouldn't change in
the face of changes in the situation. I've resisted adding "planner
hints" to the language for this reason, and I'm certainly not eager to
offer any hard guarantees about plans not changing.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to ma*******@postgresql.org)


--
Bruce Momjian | http://candle.pha.pa.us
pg***@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Nov 23 '05 #14


Tom Lane wrote:
Greg Stark <gs*****@mit.edu> writes:
I'm not about to run analyze in the middle of the data generation
(which wouldn't work anyways since it's in a transaction).

Since 7.3 or 7.4, you *can* run ANALYZE in the middle of a transaction.
The cached-plan business is a problem, I agree, but I think it's
orthogonal to this particular discussion (and you can always use EXECUTE
if you have to).


How does EXECUTE solve the cached-plan business?

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to ma*******@postgresql.org

Nov 23 '05 #15
On Fri, 27 Aug 2004 11:09:26 -0400, Joseph Shraibman
<jk*@selectacast.net> wrote:
How does EXECUTE solve the cached-plan business?


It re-plans the query at every run.

--
Markus Bertheau <mb*******@gmail.com>

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Nov 23 '05 #16
According to the docs is specifically doesn't.

http://www.postgresql.org/docs/7.4/s...l-prepare.html

When the PREPARE statement is executed, the specified statement is
parsed, rewritten, and planned. When an EXECUTE command is subsequently
issued, the prepared statement need only be executed. Thus, the parsing,
rewriting, and planning stages are only performed once, instead of every
time the statement is executed.

Markus Bertheau wrote:
On Fri, 27 Aug 2004 11:09:26 -0400, Joseph Shraibman
<jk*@selectacast.net> wrote:

How does EXECUTE solve the cached-plan business?

It re-plans the query at every run.


---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to ma*******@postgresql.org)

Nov 23 '05 #17
Joseph Shraibman <jk*@selectacast.net> writes:
Markus Bertheau wrote:
It re-plans the query at every run.
According to the docs is specifically doesn't.


You're confusing SQL EXECUTE statement with the quite different plpgsql
EXECUTE statement ...

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Nov 23 '05 #18
Paul Tillotson wrote:
If pgautovacuum does ANALYZE and we can get most people to use
pgautovacuum, I think this problem will go away.


It does. We won't get most people to use it for 8.0 since it is still a
contrib project, but hopefully it will be built-in for 8.1.
---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to ma*******@postgresql.org

Nov 23 '05 #19
Tom Lane <tg*@sss.pgh.pa.us> writes:
Greg Stark <gs*****@mit.edu> writes:
I'm not about to run analyze in the middle of the data generation
(which wouldn't work anyways since it's in a transaction).
Since 7.3 or 7.4, you *can* run ANALYZE in the middle of a transaction.
The cached-plan business is a problem, I agree, but I think it's
orthogonal to this particular discussion (and you can always use EXECUTE
if you have to).


It's orthogonal. My point was that I have a bigger problem, but even if I
address it by switching away from plpgsql, or I guess by using EXECUTE, I
would still have a problem. I didn't realize you could run analyze in a
transaction, but even being able to I wouldn't really want to have to do that
repeatedly during the job.

This new approach would actually complete the fix, a perl or plpgsql EXECUTE
implementation would gradually shift statistics during the job.

Except that the first thing the job does is delete all the old records. This
is inside a transaction. So an estimate based on the heap size would be off by
a factor of two by the time the job is done.
but separate ANALYZE could definitely make an estimate of the fraction of
dead tuples.
With analyze in a transaction I'm not clear what the semantics should be
though. I suppose it should only count tuples visible to the transaction
analyze?
Nope, you aren't. The above seems to me to be a recipe for degradation
of performance over time, precisely because the plans wouldn't change in
the face of changes in the situation.
A gradual degradation is ok. A gradual degradation means I can schedule a
nightly analyze and report on any changed plans and either automatically
accept them or manually approve them individually.

A sudden degradation is much more dangerous. Even if it's rare, a sudden
degradation means an outage in prime time.

As I said, it doesn't matter to me if every query is 10% slower than possible,
as long as no query takes 1000% as long as necessary even if it's a 1 in 1000
occurrence.
I've resisted adding "planner hints" to the language for this reason, and
I'm certainly not eager to offer any hard guarantees about plans not
changing.


I just want to control _when_ they change. Eventually you'll come around. I
think it'll be a slow gradual change in thinking as the user-base changes
though. Not something I'll change with a single argument in one day.

--
greg
---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to ma*******@postgresql.org)

Nov 23 '05 #20
Greg Stark <gs*****@mit.edu> writes:
It's orthogonal. My point was that I have a bigger problem, but even if I
address it by switching away from plpgsql, or I guess by using EXECUTE, I
would still have a problem. I didn't realize you could run analyze in a
transaction, but even being able to I wouldn't really want to have to do that
repeatedly during the job.
Why not? Given the sampling behavior that's been in there for a release
or two, ANALYZE is pretty cheap on large tables; certainly much cheaper
than any processing you might be doing that's going to grovel over the
whole table.
Except that the first thing the job does is delete all the old records. This
is inside a transaction. So an estimate based on the heap size would be off by
a factor of two by the time the job is done.
Could you use TRUNCATE? I dunno if locking the table is okay for you.
It is transaction safe though.
With analyze in a transaction I'm not clear what the semantics should be
though. I suppose it should only count tuples visible to the transaction
analyze?
It currently uses SnapshotNow, so would see committed tuples of other
transactions plus uncommitted ones of the present transaction. This is
not exactly the same thing as the transaction's snapshot, but close.
A sudden degradation is much more dangerous. Even if it's rare, a sudden
degradation means an outage in prime time.


[ shrug ] You can get a sudden degradation with fixed plans, too. All
it takes is an addition of a lot of rows in some table that had been
small.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Nov 23 '05 #21
Greg Stark wrote:
Ideally I would want a guarantee that every query would *always* result in the
same plan. Once I've tested them and approved the plans I want to know that
only those approved plans will ever run, and I want to be present and be able
to verify new plans before they go into production.
What you are saying is "never run an ANALYZE" or if you do it you have to re-test
all your plans. "*always*" the same plan is a non sense because the plan depends on the
data distribution, do you test your plans for each given histogram slice ?
I doubt I'm going to convince anyone today...


For sure not me.
Regards
Gaetano Mendola

Nov 23 '05 #22

Tom Lane <tg*@sss.pgh.pa.us> writes:
Except that the first thing the job does is delete all the old records. This
is inside a transaction. So an estimate based on the heap size would be off by
a factor of two by the time the job is done.
Could you use TRUNCATE? I dunno if locking the table is okay for you.
It is transaction safe though.


Well, if necessary I could, but if I can do it without downtime all the
better. In any case I think I'll be ok with a factor of 2 misestimation. I was
just giving an example use case for you to chew on when analyzing this new
proposal.

I'm not sure where I stand with the idea. I like the idea that table sizes
would always be fairly reasonable even without statistics. But I also have a
really strong desire for plan stability.
[ shrug ] You can get a sudden degradation with fixed plans, too. All
it takes is an addition of a lot of rows in some table that had been
small.


Well, presumably I should be aware if my data distribution is changing
drastically. That's under my control. At least the performance change will be
proportionate to the distribution change.

With plans changing on the fly I could have a query that degrades 1% for every
row added and then suddenly becomes 10x slower when I add a 17th extra row. Of
course such a system isn't perfectly tuned, or the optimizer issue should be
found and fixed. But I would rather find out about it without having my
application fail.

--
greg
---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Nov 23 '05 #23

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

2 posts views Thread by lev | last post: by
14 posts views Thread by Sean C. | last post: by
reply views Thread by phlype.johnson | last post: by
10 posts views Thread by shsandeep | last post: by
3 posts views Thread by bharadwajrv | last post: by
reply views Thread by suresh191 | last post: by
reply views Thread by harlem98 | last post: by
reply views Thread by harlem98 | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.