By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
425,478 Members | 1,833 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 425,478 IT Pros & Developers. It's quick & easy.

Identifying diskspace leakage

P: n/a

I am trying to identify tables with significant diskspace "leakage" due to
in appropriately low max_fsm_pages settings. I can see the results of
VACUUM ANALYZE VERBOSE output counts of tuples and unused tuples, and
understand that (1 - (tuples/unused)) is the amount of diskspace available
to be reclaimed with a VACUUM FULL or dump/reload.

Is there a way to identify the numbers of unused tuples without performing a
VACUUM? Is it stored in a system table anywhere? Other ideas on how to
identify disk bloat short of forcing downtime?

TIA.
---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ma*******@postgresql.org so that your
message can get through to the mailing list cleanly

Nov 23 '05 #1
Share this Question
Share on Google+
4 Replies


P: n/a
On Fri, 2004-05-14 at 10:10, Ed L. wrote:
I am trying to identify tables with significant diskspace "leakage" due to
in appropriately low max_fsm_pages settings. I can see the results of
VACUUM ANALYZE VERBOSE output counts of tuples and unused tuples, and
understand that (1 - (tuples/unused)) is the amount of diskspace available
to be reclaimed with a VACUUM FULL or dump/reload.

Is there a way to identify the numbers of unused tuples without performing a
VACUUM? Is it stored in a system table anywhere? Other ideas on how to
identify disk bloat short of forcing downtime?


You can calculate the number of bytes per row, multiply by the number of
live tuples (count(1) from table), and subtract that from the actual #
of bytes in the on-disk representation. The difference is wasted space.

-jwb
---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to ma*******@postgresql.org)

Nov 23 '05 #2

P: n/a
On Friday May 14 2004 11:47, Jeffrey W. Baker wrote:
Is there a way to identify the numbers of unused tuples without
performing a VACUUM? Is it stored in a system table anywhere? Other
ideas on how to identify disk bloat short of forcing downtime?


You can calculate the number of bytes per row, multiply by the number of
live tuples (count(1) from table), and subtract that from the actual #
of bytes in the on-disk representation. The difference is wasted space.


That works, but with umpteen clusters to manage, I'm really hoping for a
SQL-based check so it can be done remotely and non-interactively. Maybe it
is too much to keep track of, but it would be cool if VACUUM updated a
system table with the same info it spits out during verbose mode. That
would be very helpful in auto-identifying leakage and also a recent case
where the cpu:real time ratio during vacuum went thru the roof due to I/O
overload from leakage.
---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Nov 23 '05 #3

P: n/a
Here's an attempt at a query to estimate diskspace leakage. This
leakage might occur when max_fsm_pages and/or max_fsm_relations are
set too low. Not sure which of the two approaches below (leak1 or
leak2) is more accurate? Is there a better way via SQL?

The query uses the 'dbsize' project from contrib. Dbsize has a
function called relation_size() which performs a 'stat' to get
actual disk usage for a database and/or table. I use the column
pg_class.reltuples instead of actually counting rows because I
suspect that would essentially flush our OS cache of useful pages,
degrading performance. This query assumes you're keeping stats
updated.

SELECT c.relname,
SUM(s.avg_width) as width,
CAST(c.reltuples as BIGINT) AS tuples,
CAST(SUM(s.avg_width) * c.reltuples/1048576 AS INTEGER) AS tupdu,
c.relpages AS pages,
CAST(c.relpages * 8192 / 1048576 AS INTEGER) AS pgdu,
relation_size(s.tablename)/1048576 AS reldu,
CAST((relation_size(s.tablename)
- SUM(s.avg_width) * c.reltuples)/1048576 AS INTEGER) AS leak1,
CAST((relation_size(s.tablename)
- c.relpages * 8192) / 1048576 AS INTEGER) AS leak2
FROM pg_stats s, pg_class c
WHERE c.relname NOT LIKE 'pg_%'
AND c.relname = s.tablename
GROUP BY c.oid, s.tablename, c.reltuples, c.relpages, pgdu
ORDER BY tupdu;

relname | width | tuples | tupdu | pages | pgdu | reldu | leak1 | leak2
---------------+-------+---------+-------+--------+------+-------+-------+-------
table_1766485 | 27 | 198 | 0 | 12 | 0 | 0 | 0 | 0
table_1766443 | 186 | 0 | 0 | 9317 | 72 | 72 | 73 | 0
table_1766439 | 83 | 0 | 0 | 10 | 0 | 0 | 0 | 0
table_1766435 | 27 | 0 | 0 | 0 | 0 | 0 | 0 | 0
table_1766437 | 30 | 0 | 0 | 0 | 0 | 0 | 0 | 0
table_1766421 | 23 | 2 | 0 | 1 | 0 | 0 | 0 | 0
table_1766451 | 30 | 189822 | 5 | 1754 | 13 | 13 | 8 | 0
table_1766396 | 48 | 278781 | 13 | 3185 | 24 | 24 | 12 | 0
table_1766391 | 74 | 200826 | 14 | 3271 | 25 | 25 | 11 | 0
table_1766446 | 36 | 504594 | 17 | 4881 | 38 | 38 | 21 | 0
table_1766426 | 149 | 2241719 | 319 | 55555 | 434 | 434 | 116 | 0
table_1766456 | 888 | 390657 | 331 | 637949 | 887 | 4983 | 4653 | 4096
table_1766399 | 596 | 732708 | 416 | 41876 | 327 | 327 | -89 | 0
(13 rows)

The basic column definitions are:

tupdu(MB) = avg_width * reltuples
pgdu(MB) = relpages * 8K/page
reldu(MB) = relation_size(tablename) (src/contrib/dbsize)
leak1 = reldu - tupdu
leak2 = reldu - pgdu

Not sure how we ended up with a couple of cases where the number of
mb on disk was less than the estimated size; maybe we had some
deletions after the last update of pg_stats?

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to ma*******@postgresql.org)

Nov 23 '05 #4

P: n/a
[reposting...original seems to have been lost in ether...]

Here's an attempt at a query to estimate diskspace leakage. This
leakage might occur when max_fsm_pages and/or max_fsm_relations are
set too low. Not sure which of the two approaches below (leak1 or
leak2) is more accurate? Is there a better way via SQL?

The query uses the 'dbsize' project from contrib. Dbsize has a
function called relation_size() which performs a 'stat' to get
actual disk usage for a database and/or table. I use the column
pg_class.reltuples instead of actually counting rows because I
suspect that would essentially flush our OS cache of useful pages,
degrading performance. This query assumes you're keeping stats
updated.

SELECT c.relname,
SUM(s.avg_width) as width,
CAST(c.reltuples as BIGINT) AS tuples,
CAST(SUM(s.avg_width) * c.reltuples/1048576 AS INTEGER) AS tupdu,
c.relpages AS pages,
CAST(c.relpages * 8192 / 1048576 AS INTEGER) AS pgdu,
relation_size(s.tablename)/1048576 AS reldu,
CAST((relation_size(s.tablename)
- SUM(s.avg_width) * c.reltuples)/1048576 AS INTEGER) AS leak1,
CAST((relation_size(s.tablename)
- c.relpages * 8192) / 1048576 AS INTEGER) AS leak2
FROM pg_stats s, pg_class c
WHERE c.relname NOT LIKE 'pg_%'
AND c.relname = s.tablename
GROUP BY c.oid, s.tablename, c.reltuples, c.relpages, pgdu
ORDER BY tupdu;

relname | width | tuples | tupdu | pages | pgdu | reldu | leak1 |
leak2
---------------+-------+---------+-------+--------+------+-------+-------+-------
table_1766485 | 27 | 198 | 0 | 12 | 0 | 0 | 0 |
0
table_1766443 | 186 | 0 | 0 | 9317 | 72 | 72 | 73 |
0
table_1766439 | 83 | 0 | 0 | 10 | 0 | 0 | 0 |
0
table_1766435 | 27 | 0 | 0 | 0 | 0 | 0 | 0 |
0
table_1766437 | 30 | 0 | 0 | 0 | 0 | 0 | 0 |
0
table_1766421 | 23 | 2 | 0 | 1 | 0 | 0 | 0 |
0
table_1766451 | 30 | 189822 | 5 | 1754 | 13 | 13 | 8 |
0
table_1766396 | 48 | 278781 | 13 | 3185 | 24 | 24 | 12 |
0
table_1766391 | 74 | 200826 | 14 | 3271 | 25 | 25 | 11 |
0
table_1766446 | 36 | 504594 | 17 | 4881 | 38 | 38 | 21 |
0
table_1766426 | 149 | 2241719 | 319 | 55555 | 434 | 434 | 116 |
0
table_1766456 | 888 | 390657 | 331 | 637949 | 887 | 4983 | 4653 |
4096
table_1766399 | 596 | 732708 | 416 | 41876 | 327 | 327 | -89 |
0
(13 rows)

The basic column definitions are:

tupdu(MB) = avg_width * reltuples
pgdu(MB) = relpages * 8K/page
reldu(MB) = relation_size(tablename) (src/contrib/dbsize)
leak1 = reldu - tupdu
leak2 = reldu - pgdu

Not sure how we ended up with a couple of cases where the number of
mb on disk was less than the estimated size; maybe we had some
deletions after the last update of pg_stats?

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Nov 23 '05 #5

This discussion thread is closed

Replies have been disabled for this discussion.