Strange count(*) implementation?

Henk Ernst Blok

Hi Posgres users/developers,

Can anyone explain why PosgreSQL (version 7.4.5 on Linux) does a full
table scan to compute a count(*) on a base table after a vacuum analyze
has been done with no following updates that might have outdated any
statistics. Strangly the explain command does give the correct number of
tuples instantaniously from the catalog, as one would expect. Still the
optimizer thinks it needs a full table scan to do count.

See example below:

------8<---------------------------------------------------------------------------------------------

TestDB=# \d test_tbl;
Table "public.test_tbl"
Column | Type | Modifiers
--------+---------+-----------
pre | integer | not null
name | text | not null
Indexes:
"test_tbl_pkey" primary key, btree (pre)
"test_tbl_pre_index" unique, btree (pre)
"test_tbl_name_index" btree (name)

TestDB=# explain select count(*) from test_tbl;
QUERY PLAN
----------------------------------------------------------------------------
Aggregate (cost=34293.60..34293.60 rows=1 width=0)
-> Seq Scan on test_tbl (cost=0.00..34293.60 rows=166558 width=0)
(2 rows)

Time: 25.188 ms
TestDB=# select count(*) from test_tbl;
count
--------
166558
(1 row)

Time: 1024.803 ms
TestDB=#

------8<---------------------------------------------------------------------------------------------

The consequence of this seemingly odd count implementation is a very
very slow count.

Regards,
Henk Ernst Blok

--
address: DB group, Computer Science, EEMCS Dept., University of Twente,
PO Box 217, 7500 AE, ENSCHEDE, THE NETHERLANDS
phone: ++31 (0)53 489 3754 (if no response: 3690)
email: h.******@utwente.nl
WWW: http://www.cs.utwente.nl/~blokh
---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Nov 23 '05 #1

Subscribe Post Reply

4384

Richard Huxton

Henk Ernst Blok wrote:

Hi Posgres users/developers,

Can anyone explain why PosgreSQL (version 7.4.5 on Linux) does a full
table scan to compute a count(*) on a base table after a vacuum analyze
has been done with no following updates that might have outdated any
statistics. Strangly the explain command does give the correct number of
tuples instantaniously from the catalog, as one would expect. Still the
optimizer thinks it needs a full table scan to do count. The consequence of this seemingly odd count implementation is a very
very slow count.

To put it simply, count() doesn't look at the statistics because most of
the time they are out of date. In any case, they aren't useful for any
query with a WHERE clause.

The next most obvious choice is to use the primary-key index rather than
scanning the table. However, MVCC means that we can have multiple views
of the table, with some backends seeing a different number of rows than
others. So - either we need to store multiple index-entry versions as
well as multiple row versions or you need to check the actual row in
these cases. PostgreSQL does the second, which results in the full scan
which you see.

There is plenty of discussion of this (and also max()/min() aggregate
functions) in the mailing list archives.

HTH
--
Richard Huxton
Archonet Ltd

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ma*******@postgresql.org so that your
message can get through to the mailing list cleanly

Nov 23 '05 #2

Tino Wildenhain

hi,

On Tue, 2004-10-26 at 10:16, Henk Ernst Blok wrote:

Hi Posgres users/developers,

Can anyone explain why PosgreSQL (version 7.4.5 on Linux) does a full
table scan to compute a count(*) on a base table after a vacuum analyze
has been done with no following updates that might have outdated any
statistics. Strangly the explain command does give the correct number of
tuples instantaniously from the catalog, as one would expect. Still the
optimizer thinks it needs a full table scan to do count.
.... The consequence of this seemingly odd count implementation is a very
very slow count.

How should the query planner know the vacuum was recent enough and there
were no modifications to the table since?

If you are interested in rough numbers you could read the system tables
for the last vacuum statistics. If you need fast count and can spend
some cycles on inserts, just make a buffer table with count results
after insert.

Unqualified count e.g. without a WHERE clause should not need to
be used a lot.

Regards
Tino
---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Nov 23 '05 #3

Henk Ernst Blok

Hi,

My question was more of a fundamental nature as this count by scan
seemed to contradict the theory about how to optimize it.

I assume(d) the more expensive statistics (e.g., value distribution
info) are updated only when outdated too much or on request (manual
vacuum). Usually, other/cheap statistics can easily be maintained
incrementally and thus reflect actual table state after each update. Of
course, the MVCC principle seems to make things a bit more complicated I
understand now. But tracking whether statistics are dirty has to be in
the system anyway. How does it otherwise decide when to do its own
statistics updates? So if explain can get the most recent count, why
not use it in the count as well if you know the statistics are still
acurate?

By the way, a count(*) without any where does occur very frequently if
you are dealing with an OLAP load, which is the case in my setting. So,
I indeed already 'fixed' the performance problem by precomputing all
counts I can predict to be of any use.

Anyway, I understood this issue has been subject to discusion before I
was on the list (searching the archive/website was/is not very
effective, so I didn't know until someone told me so, sorry). So, I
leave it to the developers what to do with this topic.

Regards,

Henk Ernst
Tino Wildenhain wrote:

hi,

On Tue, 2004-10-26 at 10:16, Henk Ernst Blok wrote:

Hi Posgres users/developers,

Can anyone explain why PosgreSQL (version 7.4.5 on Linux) does a full
table scan to compute a count(*) on a base table after a vacuum analyze
has been done with no following updates that might have outdated any
statistics. Strangly the explain command does give the correct number of
tuples instantaniously from the catalog, as one would expect. Still the
optimizer thinks it needs a full table scan to do count.

...

The consequence of this seemingly odd count implementation is a very
very slow count.

How should the query planner know the vacuum was recent enough and there
were no modifications to the table since?

If you are interested in rough numbers you could read the system tables
for the last vacuum statistics. If you need fast count and can spend
some cycles on inserts, just make a buffer table with count results
after insert.

Unqualified count e.g. without a WHERE clause should not need to
be used a lot.

Regards
Tino

--
address: DB group, Computer Science, EEMCS Dept., University of Twente,
PO Box 217, 7500 AE, ENSCHEDE, THE NETHERLANDS
phone: ++31 (0)53 489 3754 (if no response: 3690)
email: h.******@utwente.nl
WWW: http://www.cs.utwente.nl/~blokh

Nov 23 '05 #4

Neil Conway

Henk Ernst Blok wrote:

I assume(d) the more expensive statistics (e.g., value distribution
info) are updated only when outdated too much or on request (manual
vacuum).
They are only updated on request -- i.e. when an ANALYZE is issued.
So if explain can get the most recent count, why
not use it in the count as well if you know the statistics are still
acurate?

Aside from the issue of stale statistics, there is another problem:
optimizer statistics are designed to be approximations. They are not
necessarily precise, even if ANALYZE has just been run (for example,
pg_class.reltuples is stored as a floating point number).

A practical problem is that aggregates like count() are implemented via
a general-purpose API; there is currently no provision for bypassing the
API in certain special case scenarios. See here for more info:

http://developer.postgresql.org/docs...aggregate.html

-Neil

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to ma*******@postgresql.org

Nov 23 '05 #5

Alvaro Herrera

On Tue, Oct 26, 2004 at 01:56:41PM +0200, Henk Ernst Blok wrote:

Hi,

I assume(d) the more expensive statistics (e.g., value distribution
info) are updated only when outdated too much or on request (manual
vacuum). Usually, other/cheap statistics can easily be maintained
incrementally and thus reflect actual table state after each update. Of
course, the MVCC principle seems to make things a bit more complicated I
understand now.

It's not only MVCC; it's also the fact that aggregates are extensible.
So to the system they are just opaque functions and it doesn't know how
to optimize them.

Of course this can be done, e.g. by supplying an optimizing function with
each aggregate that would try to convert the step-by-step aggregate
execution plan into a completely different operation (say by looking at
an index to obtain a max() value), but as you see this is not a trivial
task, and more importantly, it hasn't been done. Which is to mean, if
you want to try and submit a patch, it could be improved in the future.

--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"There is evil in the world. There are dark, awful things. Occasionally, we get
a glimpse of them. But there are dark corners; horrors almost impossible to
imagine... even in our worst nightmares." (Van Helsing, Dracula A.D. 1972)
---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ma*******@postgresql.org so that your
message can get through to the mailing list cleanly

Nov 23 '05 #6

Tino Wildenhain

On Tue, 2004-10-26 at 13:56, Henk Ernst Blok wrote:

Hi,

My question was more of a fundamental nature as this count by scan
seemed to contradict the theory about how to optimize it.
It is hard or next to impossible to optimize count() or more generally
aggregates in a MVCC environment. Note also all the cases where
you have a qualified select to calculate the aggregate.
I assume(d) the more expensive statistics (e.g., value distribution
info) are updated only when outdated too much or on request (manual
vacuum). Usually, other/cheap statistics can easily be maintained
incrementally and thus reflect actual table state after each update.
I remember some discussion about this. But also here the MVCC and
the general call for performance leads to the current solution
where statistics are only updated on vacuum. With PG8.0 you have
vacuum strategies with better performance which can run more often
as I understand - so while not giving you exact figures your
count() could be estimated at least.
Of course, the MVCC principle seems to make things a bit more
complicated I understand now. But tracking whether statistics are
dirty has to be in the system anyway. How does it otherwise decide
when to do its own statistics updates? So if explain can get the most
recent count, why not use it in the count as well if you know the
statistics are still acurate?
The point is: you dont know it. There is curently no: mark statistics
dirty if table has new tuples or tuples removed.
By the way, a count(*) without any where does occur very frequently if
you are dealing with an OLAP load, which is the case in my setting.
So, I indeed already 'fixed' the performance problem by precomputing
all counts I can predict to be of any use.
I'm not familar with OLAP specifics, so what is the meaning of the
count() here? What is done with this information?
Anyway, I understood this issue has been subject to discusion before I
was on the list (searching the archive/website was/is not very
effective, so I didn't know until someone told me so, sorry). So, I
leave it to the developers what to do with this topic.

Yes, very very often I can tell you :-)
Really this is an FAQ :-)

Regards
Tino
---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ma*******@postgresql.org so that your
message can get through to the mailing list cleanly

Nov 23 '05 #7

Henk Ernst Blok

Hi Tino,

Tino Wildenhain wrote:

I assume(d) the more expensive statistics (e.g., value distribution
info) are updated only when outdated too much or on request (manual
vacuum). Usually, other/cheap statistics can easily be maintained
incrementally and thus reflect actual table state after each update.

I remember some discussion about this. But also here the MVCC and
the general call for performance leads to the current solution
where statistics are only updated on vacuum. With PG8.0 you have
vacuum strategies with better performance which can run more often
as I understand - so while not giving you exact figures your
count() could be estimated at least.

I need exact figures at the moment due to the type of some scientific
experiments I'm running. Approximations are in the pipeline but I need a
base-line run without any tricks that affect the accuracy of the numbers
involved.

Of course, the MVCC principle seems to make things a bit more
complicated I understand now. But tracking whether statistics are
dirty has to be in the system anyway. How does it otherwise decide
when to do its own statistics updates? So if explain can get the most
recent count, why not use it in the count as well if you know the
statistics are still acurate?

The point is: you dont know it. There is curently no: mark statistics
dirty if table has new tuples or tuples removed.

OK. Didn't know that, but got the impression by now it worked that way
in Postgres.

By the way, a count(*) without any where does occur very frequently if
you are dealing with an OLAP load, which is the case in my setting.
So, I indeed already 'fixed' the performance problem by precomputing
all counts I can predict to be of any use.

I'm not familar with OLAP specifics, so what is the meaning of the
count() here? What is done with this information?

OLAP stands for OnLine Analystical Processing, typically meaning heavy
queries with a lot of data (for typical examples, see: http://www.tpc.org/
the TPC-H query set in particular). So decision support and datamining
are in that area for instance. My topic of interest is IR (information
retrieval) in a database context. My experiments behave like an OLAP
load at the moment. My current experiments involve a lot of counting and
expensive joins as I have to compute certain estimators in a
mathematical model I'm working on, hence the importance of the count... ;)
On MySQL each of the 30 queries I have to run took on average about 24
h. As my queries are getting even complexer I'm now trying to find out
whether Postgres can do a better job.

Regards,
Henk Ernst

--
address: DB group, Computer Science, EEMCS Dept., University of Twente,
PO Box 217, 7500 AE, ENSCHEDE, THE NETHERLANDS
phone: ++31 (0)53 489 3754 (if no response: 3690)
email: h.******@utwente.nl
WWW: http://www.cs.utwente.nl/~blokh

Nov 23 '05 #8

Tino Wildenhain

Hi,

On Tue, 2004-10-26 at 15:25, Henk Ernst Blok wrote:
....

the TPC-H query set in particular). So decision support and datamining
are in that area for instance. My topic of interest is IR (information
retrieval) in a database context. My experiments behave like an OLAP
load at the moment. My current experiments involve a lot of counting
and expensive joins as I have to compute certain estimators in a
mathematical model I'm working on, hence the importance of the
count... ;)
On MySQL each of the 30 queries I have to run took on average about 24
h. As my queries are getting even complexer I'm now trying to find out
whether Postgres can do a better job.

In your specific application if you have not many inserts
or have a phase where you do the inserts and another distinct
phase where you do the analysis, you should be able to
count() just before your analysis. If not you can always
experiment with triggers to count() in an optimized way
using just another table to store the count value
for every table you need.

INSERT/DELETE via function, use a trigger and/or RULES.
This should do the trick.
Maybe you can precalculate a lot more - depending on
the algorithms you use.

Regards
Tino
---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ma*******@postgresql.org so that your
message can get through to the mailing list cleanly

Nov 23 '05 #9

Henk Ernst Blok

Tino,

Thanks for the sugestion about exploiting the rules system. I hadn't
thought about that option yet. Currently I'm trying to pre-compute as
much as possible.

Regards,
Henk Ernst

Tino Wildenhain wrote:

Hi,

On Tue, 2004-10-26 at 15:25, Henk Ernst Blok wrote:
...

the TPC-H query set in particular). So decision support and datamining
are in that area for instance. My topic of interest is IR (information
retrieval) in a database context. My experiments behave like an OLAP
load at the moment. My current experiments involve a lot of counting
and expensive joins as I have to compute certain estimators in a
mathematical model I'm working on, hence the importance of the
count... ;)
On MySQL each of the 30 queries I have to run took on average about 24
h. As my queries are getting even complexer I'm now trying to find out
whether Postgres can do a better job.

In your specific application if you have not many inserts
or have a phase where you do the inserts and another distinct
phase where you do the analysis, you should be able to
count() just before your analysis. If not you can always
experiment with triggers to count() in an optimized way
using just another table to store the count value
for every table you need.

INSERT/DELETE via function, use a trigger and/or RULES.
This should do the trick.
Maybe you can precalculate a lot more - depending on
the algorithms you use.

Regards
Tino
---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ma*******@postgresql.org so that your
message can get through to the mailing list cleanly

Nov 23 '05 #10

Henk Ernst Blok

Hi Alvaro,

I used to do some research in extensibility of query optimizers to match
the extensibility of the operators. However, it's not really in the
focus of my research anymore so I can't spend much time on it,
unfortunately. I'll keep it in mind in case a student of the group where
I'm working is looking for an project.

Regards,
Henk Ernst.

Alvaro Herrera wrote:

On Tue, Oct 26, 2004 at 01:56:41PM +0200, Henk Ernst Blok wrote:

Hi,

I assume(d) the more expensive statistics (e.g., value distribution
info) are updated only when outdated too much or on request (manual
vacuum). Usually, other/cheap statistics can easily be maintained
incrementally and thus reflect actual table state after each update. Of
course, the MVCC principle seems to make things a bit more complicated I
understand now.

It's not only MVCC; it's also the fact that aggregates are extensible.
So to the system they are just opaque functions and it doesn't know how
to optimize them.

Of course this can be done, e.g. by supplying an optimizing function with
each aggregate that would try to convert the step-by-step aggregate
execution plan into a completely different operation (say by looking at
an index to obtain a max() value), but as you see this is not a trivial
task, and more importantly, it hasn't been done. Which is to mean, if
you want to try and submit a patch, it could be improved in the future.

Nov 23 '05 #11

Similar topics