Identifying diskspace leakage - PostgreSQL Database

Ed L.

I am trying to identify tables with significant diskspace "leakage" due to
in appropriately low max_fsm_pages settings. I can see the results of
VACUUM ANALYZE VERBOSE output counts of tuples and unused tuples, and
understand that (1 - (tuples/unused)) is the amount of diskspace available
to be reclaimed with a VACUUM FULL or dump/reload.

Is there a way to identify the numbers of unused tuples without performing a
VACUUM? Is it stored in a system table anywhere? Other ideas on how to
identify disk bloat short of forcing downtime?

TIA.
---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ma*******@postgresql.org so that your
message can get through to the mailing list cleanly

Nov 23 '05 #1

Subscribe Post Reply

1692

Jeffrey W. Baker

On Fri, 2004-05-14 at 10:10, Ed L. wrote:

I am trying to identify tables with significant diskspace "leakage" due to
in appropriately low max_fsm_pages settings. I can see the results of
VACUUM ANALYZE VERBOSE output counts of tuples and unused tuples, and
understand that (1 - (tuples/unused)) is the amount of diskspace available
to be reclaimed with a VACUUM FULL or dump/reload.

Is there a way to identify the numbers of unused tuples without performing a
VACUUM? Is it stored in a system table anywhere? Other ideas on how to
identify disk bloat short of forcing downtime?

You can calculate the number of bytes per row, multiply by the number of
live tuples (count(1) from table), and subtract that from the actual #
of bytes in the on-disk representation. The difference is wasted space.

-jwb
---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to ma*******@postgresql.org)

Nov 23 '05 #2

Ed L.

On Friday May 14 2004 11:47, Jeffrey W. Baker wrote:

Is there a way to identify the numbers of unused tuples without
performing a VACUUM? Is it stored in a system table anywhere? Other
ideas on how to identify disk bloat short of forcing downtime?

You can calculate the number of bytes per row, multiply by the number of
live tuples (count(1) from table), and subtract that from the actual #
of bytes in the on-disk representation. The difference is wasted space.

That works, but with umpteen clusters to manage, I'm really hoping for a
SQL-based check so it can be done remotely and non-interactively. Maybe it
is too much to keep track of, but it would be cool if VACUUM updated a
system table with the same info it spits out during verbose mode. That
would be very helpful in auto-identifying leakage and also a recent case
where the cpu:real time ratio during vacuum went thru the roof due to I/O
overload from leakage.
---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Nov 23 '05 #3

Ed L.

Here's an attempt at a query to estimate diskspace leakage. This
leakage might occur when max_fsm_pages and/or max_fsm_relations are
set too low. Not sure which of the two approaches below (leak1 or
leak2) is more accurate? Is there a better way via SQL?

The query uses the 'dbsize' project from contrib. Dbsize has a
function called relation_size() which performs a 'stat' to get
actual disk usage for a database and/or table. I use the column
pg_class.reltuples instead of actually counting rows because I
suspect that would essentially flush our OS cache of useful pages,
degrading performance. This query assumes you're keeping stats
updated.

SELECT c.relname,
SUM(s.avg_width) as width,
CAST(c.reltuples as BIGINT) AS tuples,
CAST(SUM(s.avg_width) * c.reltuples/1048576 AS INTEGER) AS tupdu,
c.relpages AS pages,
CAST(c.relpages * 8192 / 1048576 AS INTEGER) AS pgdu,
relation_size(s.tablename)/1048576 AS reldu,
CAST((relation_size(s.tablename)
- SUM(s.avg_width) * c.reltuples)/1048576 AS INTEGER) AS leak1,
CAST((relation_size(s.tablename)
- c.relpages * 8192) / 1048576 AS INTEGER) AS leak2
FROM pg_stats s, pg_class c
WHERE c.relname NOT LIKE 'pg_%'
AND c.relname = s.tablename
GROUP BY c.oid, s.tablename, c.reltuples, c.relpages, pgdu
ORDER BY tupdu;

relname | width | tuples | tupdu | pages | pgdu | reldu | leak1 | leak2
---------------+-------+---------+-------+--------+------+-------+-------+-------
table_1766485 | 27 | 198 | 0 | 12 | 0 | 0 | 0 | 0
table_1766443 | 186 | 0 | 0 | 9317 | 72 | 72 | 73 | 0
table_1766439 | 83 | 0 | 0 | 10 | 0 | 0 | 0 | 0
table_1766435 | 27 | 0 | 0 | 0 | 0 | 0 | 0 | 0
table_1766437 | 30 | 0 | 0 | 0 | 0 | 0 | 0 | 0
table_1766421 | 23 | 2 | 0 | 1 | 0 | 0 | 0 | 0
table_1766451 | 30 | 189822 | 5 | 1754 | 13 | 13 | 8 | 0
table_1766396 | 48 | 278781 | 13 | 3185 | 24 | 24 | 12 | 0
table_1766391 | 74 | 200826 | 14 | 3271 | 25 | 25 | 11 | 0
table_1766446 | 36 | 504594 | 17 | 4881 | 38 | 38 | 21 | 0
table_1766426 | 149 | 2241719 | 319 | 55555 | 434 | 434 | 116 | 0
table_1766456 | 888 | 390657 | 331 | 637949 | 887 | 4983 | 4653 | 4096
table_1766399 | 596 | 732708 | 416 | 41876 | 327 | 327 | -89 | 0
(13 rows)

The basic column definitions are:

tupdu(MB) = avg_width * reltuples
pgdu(MB) = relpages * 8K/page
reldu(MB) = relation_size(tablename) (src/contrib/dbsize)
leak1 = reldu - tupdu
leak2 = reldu - pgdu

Not sure how we ended up with a couple of cases where the number of
mb on disk was less than the estimated size; maybe we had some
deletions after the last update of pg_stats?

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to ma*******@postgresql.org)

Nov 23 '05 #4

Ed L.

[reposting...original seems to have been lost in ether...]

Here's an attempt at a query to estimate diskspace leakage. This
leakage might occur when max_fsm_pages and/or max_fsm_relations are
set too low. Not sure which of the two approaches below (leak1 or
leak2) is more accurate? Is there a better way via SQL?

The query uses the 'dbsize' project from contrib. Dbsize has a
function called relation_size() which performs a 'stat' to get
actual disk usage for a database and/or table. I use the column
pg_class.reltuples instead of actually counting rows because I
suspect that would essentially flush our OS cache of useful pages,
degrading performance. This query assumes you're keeping stats
updated.

SELECT c.relname,
SUM(s.avg_width) as width,
CAST(c.reltuples as BIGINT) AS tuples,
CAST(SUM(s.avg_width) * c.reltuples/1048576 AS INTEGER) AS tupdu,
c.relpages AS pages,
CAST(c.relpages * 8192 / 1048576 AS INTEGER) AS pgdu,
relation_size(s.tablename)/1048576 AS reldu,
CAST((relation_size(s.tablename)
- SUM(s.avg_width) * c.reltuples)/1048576 AS INTEGER) AS leak1,
CAST((relation_size(s.tablename)
- c.relpages * 8192) / 1048576 AS INTEGER) AS leak2
FROM pg_stats s, pg_class c
WHERE c.relname NOT LIKE 'pg_%'
AND c.relname = s.tablename
GROUP BY c.oid, s.tablename, c.reltuples, c.relpages, pgdu
ORDER BY tupdu;

relname | width | tuples | tupdu | pages | pgdu | reldu | leak1 |
leak2
---------------+-------+---------+-------+--------+------+-------+-------+-------
table_1766485 | 27 | 198 | 0 | 12 | 0 | 0 | 0 |
0
table_1766443 | 186 | 0 | 0 | 9317 | 72 | 72 | 73 |
0
table_1766439 | 83 | 0 | 0 | 10 | 0 | 0 | 0 |
0
table_1766435 | 27 | 0 | 0 | 0 | 0 | 0 | 0 |
0
table_1766437 | 30 | 0 | 0 | 0 | 0 | 0 | 0 |
0
table_1766421 | 23 | 2 | 0 | 1 | 0 | 0 | 0 |
0
table_1766451 | 30 | 189822 | 5 | 1754 | 13 | 13 | 8 |
0
table_1766396 | 48 | 278781 | 13 | 3185 | 24 | 24 | 12 |
0
table_1766391 | 74 | 200826 | 14 | 3271 | 25 | 25 | 11 |
0
table_1766446 | 36 | 504594 | 17 | 4881 | 38 | 38 | 21 |
0
table_1766426 | 149 | 2241719 | 319 | 55555 | 434 | 434 | 116 |
0
table_1766456 | 888 | 390657 | 331 | 637949 | 887 | 4983 | 4653 |
4096
table_1766399 | 596 | 732708 | 416 | 41876 | 327 | 327 | -89 |
0
(13 rows)

The basic column definitions are:

tupdu(MB) = avg_width * reltuples
pgdu(MB) = relpages * 8K/page
reldu(MB) = relation_size(tablename) (src/contrib/dbsize)
leak1 = reldu - tupdu
leak2 = reldu - pgdu

Not sure how we ended up with a couple of cases where the number of
mb on disk was less than the estimated size; maybe we had some
deletions after the last update of pg_stats?

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Nov 23 '05 #5

Similar topics

Memory leakage in set class

by: frustrated | last post by:

Before I begin, I must first make the following disclaimer: Although I have considerable programming experience, I do not consider myself by any means to be an expert C++ programmer. The following...

C / C++

base.Render(writer) casuse memory leakage

by: boy | last post by:

Hi all, I have created a simple template class as follow, but i encountered memory leakage on the base.Render(writer). Have all you of encountered the same problem? using System; using...

.NET Framework

C++ / JNI memory leakage, help needed

by: Sambucus | last post by:

Hi group! I am using C++ and java with JNI to get some text in a RICHEDIT to my java program. I do so by accessing a C++ method every second. It all works fine except that it leaks memory every...

C / C++

memory leakage in c++

by: andylcx | last post by:

hi all: I have a code like below, is there any serious memory leakage in my code. I am confusion now but no idea how to fix it. In the member function of class A, I create a new object of class B...

C / C++

a2k - find out diskspace remaining

by: Deano | last post by:

Given a drive letter is there some cut and paste code out there that will check the space remaining? I want to be able to warn the user if the backup operation might not succeed due to a lack of...

Microsoft Access / VBA

memory leakage

by: madhawi | last post by:

i want to know that on what situation memory leakage happan and what is the solution to solve the problem of memory leakage.

C / C++

use Filesys::Diskspace

by: vasu1308 | last post by:

Hi I want to monitor disk space and send a email if it exceeds the limit. I m using "use Filesys:: Diskspace" but there is an error in compling. "Can't locate Filesys/Diskspace.pm in @INC...

Perl

Memory Leakage Due to ICON

by: prpradip | last post by:

I have an ImageList (_imageList). In _imageList I have put large numbers of Icons. Now what I need is to get Handle of all Icons that I put in _imageList, so that I can destroy (DestoryIcon) them...

.NET Framework

Access Europe: Command bars, the Access Shortcut Tool and a simple Audit Log - Wed 3 April

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

General

One-click Importing Excel Data into a*Database

by: ryjfgjl | last post by:

In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...

Microsoft Excel

Easy Steps to Fix "Canon Printer Won't Connect to WiFi Network"

by: taylorcarr | last post by:

A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...

General

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Batch import of multiple excel files into the database

by: ryjfgjl | last post by:

If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...

Data Management

Merging data from multiple Excel files

by: ryjfgjl | last post by:

In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...

Data Management

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware