By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
424,833 Members | 2,076 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 424,833 IT Pros & Developers. It's quick & easy.

why does count take so long?

P: n/a
On a 7.3.4 database:

explain analyse select count(*) from elog;
Aggregate (cost=223764.05..223764.05 rows=1 width=0) (actual
time=81372.11..81372.11 rows=1 loops=1)
-> Seq Scan on elog (cost=0.00..203012.24 rows=8300724 width=0)
(actual time=3.91..71542.53 rows=8313762 loops=1)
Total runtime: 81378.42 msec
(3 rows)
It looks like the aggregate took 10 secs all by itself. What's taking
so long?
---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Nov 11 '05 #1
Share this Question
Share on Google+
22 Replies


P: n/a
On Sun, Sep 07, 2003 at 21:06:26 -0400,
Joseph Shraibman <jk*@selectacast.net> wrote:
On a 7.3.4 database:

explain analyse select count(*) from elog;
Aggregate (cost=223764.05..223764.05 rows=1 width=0) (actual
time=81372.11..81372.11 rows=1 loops=1)
-> Seq Scan on elog (cost=0.00..203012.24 rows=8300724 width=0)
(actual time=3.91..71542.53 rows=8313762 loops=1)
Total runtime: 81378.42 msec
(3 rows)
It looks like the aggregate took 10 secs all by itself. What's taking
so long?


It looks like there are 8 million log records that need to be counted.

There have been some discussions over the last few days about tricks to
get better performance if you need to use count on large tables.

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Nov 11 '05 #2

P: n/a
Bruno Wolff III <br***@wolff.to> writes:
Joseph Shraibman <jk*@selectacast.net> wrote:
It looks like the aggregate took 10 secs all by itself. What's taking
so long?
It looks like there are 8 million log records that need to be counted.


Yeah, but I think he's complaining about the 10sec delta for the
aggregate on top of the 71sec to read the 8 million rows. That
seems high to me too. On a 10-mil-row test table, I get

regression=# explain analyze select count(*) from foo;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------
Aggregate (cost=22.50..22.50 rows=1 width=0) (actual time=189865.81..189865.81 rows=1 loops=1)
-> Seq Scan on foo (cost=0.00..20.00 rows=1000 width=0) (actual time=18.88..163833.61 rows=10240000 loops=1)
Total runtime: 189865.91 msec
(3 rows)

in other words 26sec to do the aggregate on top of 163sec to read the
rows.

Unless Joseph's machine has a way better IO-to-CPU ratio than my little
development machine, there's something odd about his numbers.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Nov 11 '05 #3

P: n/a

Tom Lane <tg*@sss.pgh.pa.us> writes:
Yeah, but I think he's complaining about the 10sec delta for the
aggregate on top of the 71sec to read the 8 million rows. That
seems high to me too. On a 10-mil-row test table, I get .... in other words 26sec to do the aggregate on top of 163sec to read the
rows.

Unless Joseph's machine has a way better IO-to-CPU ratio than my little
development machine, there's something odd about his numbers.
Why is 10s (a 14% delta) for 8M records suspicious but 26s (16% delta) for 10M
not suspicious? These results seem fairly consistent actually.

I think what the original question was is "what work does this 10s represent".
I'm curious too. Is it really just 10 million times the cpu cycles necessary
to dispatch a call to the count() aggregate state function?

PS:
regression=# explain analyze select count(*) from foo;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------
Aggregate (cost=22.50..22.50 rows=1 width=0) (actual time=189865.81..189865.81 rows=1 loops=1)
-> Seq Scan on foo (cost=0.00..20.00 rows=1000 width=0) (actual time=18.88..163833.61 rows=10240000 loops=1)
Total runtime: 189865.91 msec
(3 rows)


Hey, you haven't analyzed your table! :)

--
greg
---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Nov 11 '05 #4

P: n/a
Greg Stark <gs*****@mit.edu> writes:
Tom Lane <tg*@sss.pgh.pa.us> writes:
Why is 10s (a 14% delta) for 8M records suspicious but 26s (16% delta) for 10M
not suspicious? These results seem fairly consistent actually.
Duh --- must have gotten the decimal point off in my mental arithmetic.
I think what the original question was is "what work does this 10s represent".
I'm curious too. Is it really just 10 million times the cpu cycles necessary
to dispatch a call to the count() aggregate state function?


Well, it's 10 mil cycles of the aggregate plan node, which is going to
involve rather more work than just the aggregate function call. But the
last time I profiled COUNT(), it seemed that the pallocs/pfrees needed
for the int8 counter were the largest single chunk of CPU time.

Something I've wanted to do for awhile is to look at making int8 and
float8 be pass-by-value datatypes on machines where Datum is naturally
8 bytes (ie, any 8-byte-pointer architecture). I doubt it would be a
win to widen Datum on 32-bit machines, though; the distributed costs
would swamp the advantage from making these datatypes more efficient.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Nov 11 '05 #5

P: n/a

Tom Lane <tg*@sss.pgh.pa.us> writes:
Something I've wanted to do for awhile is to look at making int8 and
float8 be pass-by-value datatypes on machines where Datum is naturally
8 bytes (ie, any 8-byte-pointer architecture). I doubt it would be a
win to widen Datum on 32-bit machines, though; the distributed costs
would swamp the advantage from making these datatypes more efficient.


Things like count(*) could use int4 until it overflows though.
Is int4 a pass-by-value datatype on 32-bit machines?

--
greg
---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Nov 11 '05 #6

P: n/a
Greg Stark <gs*****@mit.edu> writes:
Things like count(*) could use int4 until it overflows though.
I don't see a reasonable way for an aggregate to change state datatype
on the fly; otherwise this would be a great solution.
Is int4 a pass-by-value datatype on 32-bit machines?


Yeah. Not too long ago, count() used int4, but we got some complaints
about it overflowing on big tables.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Nov 11 '05 #7

P: n/a
I said:
Greg Stark <gs*****@mit.edu> writes:
Things like count(*) could use int4 until it overflows though.
I don't see a reasonable way for an aggregate to change state datatype
on the fly; otherwise this would be a great solution.


On the other hand, the cost is imposed by the generic aggregate
definition that says the aggregate state transition function is an
ordinary function. This is fine for user-defined aggregates, but there
is no law that says that all the built-in aggregates must use that same
API. We could probably invent some API that allows COUNT(*) to keep its
running count someplace where it needn't be re-palloc'd on every cycle.
Something to think about for 7.5 (too late for 7.4 I fear).

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to ma*******@postgresql.org)

Nov 11 '05 #8

P: n/a
Tom Lane wrote:
Unless Joseph's machine has a way better IO-to-CPU ratio than my little
development machine, there's something odd about his numbers.

regards, tom lane

Doing some math:

81372.11 / 71542.53 : 1.1373949174008804
189865.81 / 163833.61 : 1.1588941365572059

So I have count() taking 14% over the seqscan, and you have 16%
---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to ma*******@postgresql.org

Nov 11 '05 #9

P: n/a
How about keeping counts of inserts, deletes and updates per table per
transaction as part of the live statistics?

Tom Lane wrote:

I said:
Greg Stark <gs*****@mit.edu> writes:
Things like count(*) could use int4 until it overflows though.

I don't see a reasonable way for an aggregate to change state datatype
on the fly; otherwise this would be a great solution.


On the other hand, the cost is imposed by the generic aggregate
definition that says the aggregate state transition function is an
ordinary function. This is fine for user-defined aggregates, but there
is no law that says that all the built-in aggregates must use that same
API. We could probably invent some API that allows COUNT(*) to keep its
running count someplace where it needn't be re-palloc'd on every cycle.
Something to think about for 7.5 (too late for 7.4 I fear).

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to ma*******@postgresql.org)


---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Nov 11 '05 #10

P: n/a
Is improving count(*) <a lot if possible> on one of the TODO lists?

Jean-Luc Lachance wrote:
How about keeping counts of inserts, deletes and updates per table per
transaction as part of the live statistics?

Tom Lane wrote:

I said:

Greg Stark <gs*****@mit.edu> writes:
Things like count(*) could use int4 until it overflows though.
I don't see a reasonable way for an aggregate to change state datatype
on the fly; otherwise this would be a great solution.

On the other hand, the cost is imposed by the generic aggregate
definition that says the aggregate state transition function is an
ordinary function. This is fine for user-defined aggregates, but there
is no law that says that all the built-in aggregates must use that same
API. We could probably invent some API that allows COUNT(*) to keep its
running count someplace where it needn't be re-palloc'd on every cycle.
Something to think about for 7.5 (too late for 7.4 I fear).

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to ma*******@postgresql.org)


---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Nov 11 '05 #11

P: n/a
The problem with that approach is that while there may only be one or two
tables that you want to do max(field) on, it would require that all tables
be kept track of, increasing overhead. Keep in mind, select id from table
order by id desc limit 1 is a snap, hits the indexes, and runs fast.

also, I don't think postgresql keeps "live" statistics, only "dead" ones
i.e. they are only gathered by running analyze. If you just need an
approximate account, you can already get that from the statistics table.

Now, if there were a simple way of assigning such behaviour to a table,
(i.e. with count or something like that) via a trigger, that might be
worth looking into, but keep in mind, it creates a serious bottleneck
during inserts/updates/deletes.

On Tue, 9 Sep 2003, Jean-Luc Lachance wrote:
How about keeping counts of inserts, deletes and updates per table per
transaction as part of the live statistics?

Tom Lane wrote:

I said:
Greg Stark <gs*****@mit.edu> writes:
> Things like count(*) could use int4 until it overflows though.

I don't see a reasonable way for an aggregate to change state datatype
on the fly; otherwise this would be a great solution.


On the other hand, the cost is imposed by the generic aggregate
definition that says the aggregate state transition function is an
ordinary function. This is fine for user-defined aggregates, but there
is no law that says that all the built-in aggregates must use that same
API. We could probably invent some API that allows COUNT(*) to keep its
running count someplace where it needn't be re-palloc'd on every cycle.
Something to think about for 7.5 (too late for 7.4 I fear).

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to ma*******@postgresql.org)


---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Nov 11 '05 #12

P: n/a
In the last exciting episode, jl******@nsd.ca (Jean-Luc Lachance) wrote:
How about keeping counts of inserts, deletes and updates per table per
transaction as part of the live statistics?


Aye, there's the rub.

That's exactly what _won't_ work, and that's exactly the case that is
somewhat pathological under MVCC.

With MVCC, data "appears as if by magic" when its transaction COMMITs,
thereby revealing the rows to the world.

Unfortunately, there's no simple way of making updates to counts
"simply appear" when they take effect, not without turning the updates
into a concurrency bottleneck.

Here's a bit of a wild thought...

Assume a table with schema as follows:
create table pg_count (
xid integer, --- Actually, it's not an integer, right, Andrew? :-(
reltype oid,
count integer
);

Every transaction, "xid," affects some number of tuples. So that for
a transaction, #2134151 that adds 5 rows to table with oid 12345 and deletes 4
rows from table with 45678, part of the transaction would include
inserting these rows:

insert into pg_count (xid, reltype, count) values (2134151, 12345, 5);
insert into pg_count (xid, reltype, count) values (2134151, 45678, -4);

In order to get the count for table 12345, you could then go to
pg_count and request:
select sum(count) from pg_count where reltype = 12345;

The size of this table would increase every time a transaction gets
committed, so presumably part of VACUUM TABLE would be a
collection/summarization, thus...

-- Collect existing stats into 1 row
insert into pg_count(xid, reltype, count) values (currxid,
currtable, select sum(count) from pg_count where reltype =
currtable);

-- Delete the old stats
delete from pg_count where reltype = currtable and xid <> currxid;

This will cope with concurrency reasonably well (modulo having to make
sure that the "collect/delete" transaction is a serialized one).

Unfortunately, if a table is updated frequently, the summary
select sum(count) from pg_count where reltype = 12345;
will have to collect together quite a large number of entries, which
makes this "less cheap."

That suggests an optimization; any time the COUNT is selected, the old
stats can and should be collected into 1 row and the old data deleted.
--
wm(X,Y):-write(X),write('@'),write(Y). wm('aa454','freenet.carleton.ca').
http://www3.sympatico.ca/cbbrowne/nonrdbms.html
Sturgeon's Law: 90% of *EVERYTHING* is crud.
Nov 11 '05 #13

P: n/a
Tom Lane wrote:
Something to think about for 7.5 (too late for 7.4 I fear).


What about 7.4.1? This wouldn't affect how data is stored on disk,
which is what would keep an upgrade out of a minor release, right?
---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to ma*******@postgresql.org

Nov 11 '05 #14

P: n/a
Joseph Shraibman <jk*@selectacast.net> writes:
Tom Lane wrote:
Something to think about for 7.5 (too late for 7.4 I fear).


What about 7.4.1? This wouldn't affect how data is stored on disk,
which is what would keep an upgrade out of a minor release, right?


Minor releases are basically bugfix-only for PG.

-Doug

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to ma*******@postgresql.org)

Nov 11 '05 #15

P: n/a
Joseph Shraibman wrote:
Tom Lane wrote:
Something to think about for 7.5 (too late for 7.4 I fear).


What about 7.4.1? This wouldn't affect how data is stored on disk,
which is what would keep an upgrade out of a minor release, right?


We only do bug fixes in minor releases like 7.4.1.

--
Bruce Momjian | http://candle.pha.pa.us
pg***@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Nov 11 '05 #16

P: n/a
Tom Lane wrote:
Something to think about for 7.5 (too late for 7.4 I fear).


What about 7.4.1? This wouldn't affect how data is stored on disk,
which is what would keep an upgrade out of a minor release, right?
---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to ma*******@postgresql.org

Nov 11 '05 #17

P: n/a
Joseph Shraibman <jk*@selectacast.net> writes:
Tom Lane wrote:
Something to think about for 7.5 (too late for 7.4 I fear).


What about 7.4.1? This wouldn't affect how data is stored on disk,
which is what would keep an upgrade out of a minor release, right?


Minor releases are basically bugfix-only for PG.

-Doug

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to ma*******@postgresql.org)

Nov 11 '05 #18

P: n/a
Joseph Shraibman wrote:
Tom Lane wrote:
Something to think about for 7.5 (too late for 7.4 I fear).


What about 7.4.1? This wouldn't affect how data is stored on disk,
which is what would keep an upgrade out of a minor release, right?


We only do bug fixes in minor releases like 7.4.1.

--
Bruce Momjian | http://candle.pha.pa.us
pg***@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Nov 11 '05 #19

P: n/a
Christopher,

You did not quite understand.

The counts in question is the actual counts (deltas) for the
transtactions.

the tuple should be something like:
xid, reltype, insert_count, update_count, delete_count

When a COUNT(*) is issued the commited tuples are totaled up stored as
xid 0 or whatever and the commited tuples deleted (all this with
appropriate locks).

I am not sure if there is a need for the update count.

JLL
Christopher Browne wrote:

In the last exciting episode, jl******@nsd.ca (Jean-Luc Lachance) wrote:
How about keeping counts of inserts, deletes and updates per table per
transaction as part of the live statistics?


Aye, there's the rub.

That's exactly what _won't_ work, and that's exactly the case that is
somewhat pathological under MVCC.

With MVCC, data "appears as if by magic" when its transaction COMMITs,
thereby revealing the rows to the world.

Unfortunately, there's no simple way of making updates to counts
"simply appear" when they take effect, not without turning the updates
into a concurrency bottleneck.

Here's a bit of a wild thought...

Assume a table with schema as follows:
create table pg_count (
xid integer, --- Actually, it's not an integer, right, Andrew? :-(
reltype oid,
count integer
);

Every transaction, "xid," affects some number of tuples. So that for
a transaction, #2134151 that adds 5 rows to table with oid 12345 and deletes 4
rows from table with 45678, part of the transaction would include
inserting these rows:

insert into pg_count (xid, reltype, count) values (2134151, 12345, 5);
insert into pg_count (xid, reltype, count) values (2134151, 45678, -4);

In order to get the count for table 12345, you could then go to
pg_count and request:
select sum(count) from pg_count where reltype = 12345;

The size of this table would increase every time a transaction gets
committed, so presumably part of VACUUM TABLE would be a
collection/summarization, thus...

-- Collect existing stats into 1 row
insert into pg_count(xid, reltype, count) values (currxid,
currtable, select sum(count) from pg_count where reltype =
currtable);

-- Delete the old stats
delete from pg_count where reltype = currtable and xid <> currxid;

This will cope with concurrency reasonably well (modulo having to make
sure that the "collect/delete" transaction is a serialized one).

Unfortunately, if a table is updated frequently, the summary
select sum(count) from pg_count where reltype = 12345;
will have to collect together quite a large number of entries, which
makes this "less cheap."

That suggests an optimization; any time the COUNT is selected, the old
stats can and should be collected into 1 row and the old data deleted.
--
wm(X,Y):-write(X),write('@'),write(Y). wm('aa454','freenet.carleton.ca').
http://www3.sympatico.ca/cbbrowne/nonrdbms.html
Sturgeon's Law: 90% of *EVERYTHING* is crud.

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster


---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Nov 11 '05 #20

P: n/a
On Mon, Sep 08, 2003 at 05:47:48PM -0400, Tom Lane wrote:
Greg Stark <gs*****@mit.edu> writes:
Things like count(*) could use int4 until it overflows though.


I don't see a reasonable way for an aggregate to change state datatype
on the fly; otherwise this would be a great solution.


What about estimating which one you'll need based on statistics? Yeah,
might fail some times, but see my next comment...
Is int4 a pass-by-value datatype on 32-bit machines?


Yeah. Not too long ago, count() used int4, but we got some complaints
about it overflowing on big tables.


MS SQL (and I believe some other databases) provide countbig() which is
int8 for this very reason. Many times you *know* you won't have more
than int4 rows (ie, if you're using an int4 as a PK), so there's no
reason to spend the overhead associated with an int8.

Maybe have count4(), count8(), and have count() guess as to which one it
should use based on statistics?
--
Jim C. Nasby, Database Consultant ji*@nasby.net
Member: Triangle Fraternity, Sports Car Club of America
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ma*******@postgresql.org so that your
message can get through to the mailing list cleanly

Nov 11 '05 #21

P: n/a
Christopher,

You did not quite understand.

The counts in question is the actual counts (deltas) for the
transtactions.

the tuple should be something like:
xid, reltype, insert_count, update_count, delete_count

When a COUNT(*) is issued the commited tuples are totaled up stored as
xid 0 or whatever and the commited tuples deleted (all this with
appropriate locks).

I am not sure if there is a need for the update count.

JLL
Christopher Browne wrote:

In the last exciting episode, jl******@nsd.ca (Jean-Luc Lachance) wrote:
How about keeping counts of inserts, deletes and updates per table per
transaction as part of the live statistics?


Aye, there's the rub.

That's exactly what _won't_ work, and that's exactly the case that is
somewhat pathological under MVCC.

With MVCC, data "appears as if by magic" when its transaction COMMITs,
thereby revealing the rows to the world.

Unfortunately, there's no simple way of making updates to counts
"simply appear" when they take effect, not without turning the updates
into a concurrency bottleneck.

Here's a bit of a wild thought...

Assume a table with schema as follows:
create table pg_count (
xid integer, --- Actually, it's not an integer, right, Andrew? :-(
reltype oid,
count integer
);

Every transaction, "xid," affects some number of tuples. So that for
a transaction, #2134151 that adds 5 rows to table with oid 12345 and deletes 4
rows from table with 45678, part of the transaction would include
inserting these rows:

insert into pg_count (xid, reltype, count) values (2134151, 12345, 5);
insert into pg_count (xid, reltype, count) values (2134151, 45678, -4);

In order to get the count for table 12345, you could then go to
pg_count and request:
select sum(count) from pg_count where reltype = 12345;

The size of this table would increase every time a transaction gets
committed, so presumably part of VACUUM TABLE would be a
collection/summarization, thus...

-- Collect existing stats into 1 row
insert into pg_count(xid, reltype, count) values (currxid,
currtable, select sum(count) from pg_count where reltype =
currtable);

-- Delete the old stats
delete from pg_count where reltype = currtable and xid <> currxid;

This will cope with concurrency reasonably well (modulo having to make
sure that the "collect/delete" transaction is a serialized one).

Unfortunately, if a table is updated frequently, the summary
select sum(count) from pg_count where reltype = 12345;
will have to collect together quite a large number of entries, which
makes this "less cheap."

That suggests an optimization; any time the COUNT is selected, the old
stats can and should be collected into 1 row and the old data deleted.
--
wm(X,Y):-write(X),write('@'),write(Y). wm('aa454','freenet.carleton.ca').
http://www3.sympatico.ca/cbbrowne/nonrdbms.html
Sturgeon's Law: 90% of *EVERYTHING* is crud.

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster


---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Nov 11 '05 #22

P: n/a
On Mon, Sep 08, 2003 at 05:47:48PM -0400, Tom Lane wrote:
Greg Stark <gs*****@mit.edu> writes:
Things like count(*) could use int4 until it overflows though.


I don't see a reasonable way for an aggregate to change state datatype
on the fly; otherwise this would be a great solution.


What about estimating which one you'll need based on statistics? Yeah,
might fail some times, but see my next comment...
Is int4 a pass-by-value datatype on 32-bit machines?


Yeah. Not too long ago, count() used int4, but we got some complaints
about it overflowing on big tables.


MS SQL (and I believe some other databases) provide countbig() which is
int8 for this very reason. Many times you *know* you won't have more
than int4 rows (ie, if you're using an int4 as a PK), so there's no
reason to spend the overhead associated with an int8.

Maybe have count4(), count8(), and have count() guess as to which one it
should use based on statistics?
--
Jim C. Nasby, Database Consultant ji*@nasby.net
Member: Triangle Fraternity, Sports Car Club of America
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ma*******@postgresql.org so that your
message can get through to the mailing list cleanly

Nov 11 '05 #23

This discussion thread is closed

Replies have been disabled for this discussion.