473,320 Members | 1,876 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,320 software developers and data experts.

Interpreting vacuum verbosity


If I see VACUUM ANALYZE VERBOSE output like this...

INFO: --Relation public.foo--
INFO: Index idx_foo_bar: Pages 219213; Tuples 28007: Deleted 9434.
CPU 17.05s/4.58u sec elapsed 3227.62 sec.

....am I correct in reading this to say that it took more than 53 minutes
(3227 secs) to get 17 seconds of CPU time? Is this an indicator of
possible I/O contention? What else would account for this if my CPUs are
clearly not very busy?

TIA.
---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Nov 23 '05 #1
15 2320
"Ed L." <pg***@bluepolka.net> writes:
If I see VACUUM ANALYZE VERBOSE output like this... INFO: --Relation public.foo--
INFO: Index idx_foo_bar: Pages 219213; Tuples 28007: Deleted 9434.
CPU 17.05s/4.58u sec elapsed 3227.62 sec. ...am I correct in reading this to say that it took more than 53 minutes
(3227 secs) to get 17 seconds of CPU time? Is this an indicator of
possible I/O contention?


More like "your disk drives are being pounded into the ground" ?

It's hard to evaluate this without knowing what else is going on in your
system at the same time. In general a pure VACUUM process *ought* to be
I/O bound. But without any additional data it's hard to say if 200:1
CPU vs I/O ratio is reasonable or not. Were other things happening at
the same time, and if so did they seem bogged down? What sort of
hardware is this on anyway?

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to ma*******@postgresql.org)

Nov 23 '05 #2
On Thursday May 6 2004 10:30, Tom Lane wrote:
"Ed L." <pg***@bluepolka.net> writes:
If I see VACUUM ANALYZE VERBOSE output like this...

INFO: --Relation public.foo--
INFO: Index idx_foo_bar: Pages 219213; Tuples 28007: Deleted 9434.
CPU 17.05s/4.58u sec elapsed 3227.62 sec.

...am I correct in reading this to say that it took more than 53
minutes (3227 secs) to get 17 seconds of CPU time? Is this an
indicator of possible I/O contention?


More like "your disk drives are being pounded into the ground" ?

It's hard to evaluate this without knowing what else is going on in your
system at the same time. In general a pure VACUUM process *ought* to be
I/O bound. But without any additional data it's hard to say if 200:1
CPU vs I/O ratio is reasonable or not. Were other things happening at
the same time, and if so did they seem bogged down? What sort of
hardware is this on anyway?


There was a ton of other activity; tens to hundreds of inserts and deletes
occurring per second. Lots of bogged down, ridiculously slow queries:
30-second selects on a 500-row table immediately after ANALYZE finished on
the table, absurdly long inserts, etc. This is a SmartArray 5i/32 RAID5
device with some sort of Dell RAID controller, I believe, 160mb/s, dual
3.2GHz xeons, plenty of RAM.

Some s/w redesign cut the I/O very signficantly, but it was still
ridiculously slow. After seeing the VACUUM ANALYZE VERBOSE output for the
most troublesomely slow table, and noticing 2.5M unused tuples there, we
decided to drop/recreate/reload that table to reclaim the space and on the
hunch that it might be related. We did that in a transaction without any
customer downtime, and upon reloading, the system was blazing fast again.
Joy. That was cool.

I guess the activity just totally outran the ability of autovac to keep up.
I was under the impression that unused tuples were only a diskspace issue
and not such a performance issue, but maybe the live data just got so
fragmented that it took forever to perform small scans over so many pages?
---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Nov 23 '05 #3
"Ed L." <pg***@bluepolka.net> writes:
I guess the activity just totally outran the ability of autovac to keep up.


Could you have been bit by autovac's bug with misreading '3e6' as '3'?
If you don't have a recent version it's likely to fail to vacuum large
tables often enough.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Nov 23 '05 #4
On Friday May 7 2004 9:09, Tom Lane wrote:
"Ed L." <pg***@bluepolka.net> writes:
I guess the activity just totally outran the ability of autovac to keep
up.


Could you have been bit by autovac's bug with misreading '3e6' as '3'?
If you don't have a recent version it's likely to fail to vacuum large
tables often enough.


No, our autovac logs the number of changes (upd+del for vac, upd+ins+del for
analyze) on each round of checks, and we can see it was routinely
performing when expected. The number of updates/deletes just far exceeded
the thresholds. Vac threshold was 2000, and at times there might be
300,000 outstanding changes in the 10-30 minutes between vacuums.

Given the gradual performance degradation we saw over a period of days if
not weeks, and the extremely high numbers of unused tuples, I'm wondering
if there is something like a data fragmentation problem occurring in which
we're having to read many many disk pages to get just a few tuples off each
page? This cluster has 3 databases (2 nearly idle) with a total of 600
tables (about 300 in the active database). Gzipped dumps are 1.7GB.
max_fsm_relations = 1000 and max_fsm_pages = 10000. The pattern of ops is
a continuous stream of inserts, sequential scan selects, and deletes.
---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Nov 23 '05 #5
"Ed L." <pg***@bluepolka.net> writes:
No, our autovac logs the number of changes (upd+del for vac, upd+ins+del for
analyze) on each round of checks, and we can see it was routinely
performing when expected. The number of updates/deletes just far exceeded
the thresholds. Vac threshold was 2000, and at times there might be
300,000 outstanding changes in the 10-30 minutes between vacuums.
Well, in that case you probably want a lot less than "10-30 minutes"
between vacuums, at least for this particular table. I don't know how
one configures autovac for this, but I suppose it can be done ...
max_fsm_relations = 1000 and max_fsm_pages = 10000.


Also you doubtless need max_fsm_pages a lot higher than that. A
conservative setting would make it as big as your whole database,
eg for a 10Gb disk footprint use 10Gb/8K (something upwards of
a million) FSM page slots.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to ma*******@postgresql.org

Nov 23 '05 #6
At some point in time, tg*@sss.pgh.pa.us (Tom Lane) wrote:
max_fsm_relations = 1000 and max_fsm_pages = 10000.


Also you doubtless need max_fsm_pages a lot higher than that. A
conservative setting would make it as big as your whole database,
eg for a 10Gb disk footprint use 10Gb/8K (something upwards of
a million) FSM page slots.


At some point, someone was going to write a "white paper" detailing how one
might go about setting these parameters. If that someone has done so, I'd love
to hear about it. If that someone hasn't ... well, how much beer would we have
to provide to get you to talk? 8-)

--
Jeff Boes vox 269.226.9550 ext 24
Database Engineer fax 269.349.9076
Nexcerpt, Inc. http://www.nexcerpt.com
...Nexcerpt... Extend your Expertise
Nov 23 '05 #7
On Friday May 7 2004 11:25, Tom Lane wrote:
"Ed L." <pg***@bluepolka.net> writes:
No, our autovac logs the number of changes (upd+del for vac,
upd+ins+del for analyze) on each round of checks, and we can see it was
routinely performing when expected. The number of updates/deletes just
far exceeded the thresholds. Vac threshold was 2000, and at times
there might be 300,000 outstanding changes in the 10-30 minutes between
vacuums.


Well, in that case you probably want a lot less than "10-30 minutes"
between vacuums, at least for this particular table. I don't know how
one configures autovac for this, but I suppose it can be done ...


This period is the minimum time it takes to vacuum or analyze every table
that "needs it" in round-robin fashion. Sometimes it is much shorter
(seconds), sometimes longer, depending on how much upd/del/ins activity
there has been. That seems too long/slow.
max_fsm_relations = 1000 and max_fsm_pages = 10000.


Also you doubtless need max_fsm_pages a lot higher than that. A
conservative setting would make it as big as your whole database,
eg for a 10Gb disk footprint use 10Gb/8K (something upwards of
a million) FSM page slots.


Ah, OK. Two questions:

1) I'm inclined to set this to handle as large a DB footprint as will be in
the coming year or two, so maybe 3X what it is now. What is the
impact/cost of setting max_fsm_pages at, say, 3M for an 8GB footprint? (3
x 8GB/8K)

2) Would this low setting of 10000 explain the behavior we saw of seqscans
of a perfectly analyzed table with 1000 rows requiring ridiculous amounts
of time even after we cutoff the I/O load?

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Nov 23 '05 #8
On Friday May 7 2004 12:23, Ed L. wrote:
On Friday May 7 2004 12:20, Ed L. wrote:
1) I'm inclined to set this to handle as large a DB footprint as will
be in the coming year or two, so maybe 3X what it is now. What is the
impact/cost of setting max_fsm_pages at, say, 3M for an 8GB footprint?
(3 x 8GB/8K)


Ok, so I see 40B per, so setting it to 3M ==> 3M * 40 = 120MB of
additional RAM usage for this? Any other impacts with which to be
concerned?


Sorry, I see that's *6B* per, so setting it to 3M ==> 18MB, which is trivial
for the benefit. Any other concerns in setting this too high?

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to ma*******@postgresql.org

Nov 23 '05 #9
"Ed L." <pg***@bluepolka.net> writes:
2) Would this low setting of 10000 explain the behavior we saw of seqscans
of a perfectly analyzed table with 1000 rows requiring ridiculous amounts
of time even after we cutoff the I/O load?


Possibly. The undersized setting would cause leakage of disk space
(that is, new rows get appended to the end of the table even when space
is available within the table, because the system has "forgotten" about
that space due to lack of FSM slots to remember it in). If the physical
size of the table file gets large enough, seqscans will take a long time
no matter how few live rows there are. I don't recall now whether your
VACUUM VERBOSE results showed that the physical table size (number of
pages) was out of proportion to the actual number of live rows. But it
sure sounds like that might have been the problem.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Nov 23 '05 #10
"Ed L." <pg***@bluepolka.net> writes:
Sorry, I see that's *6B* per, so setting it to 3M ==> 18MB, which is trivial
for the benefit. Any other concerns in setting this too high?


Not that I know of.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Nov 23 '05 #11
On Friday May 7 2004 12:20, Ed L. wrote:

1) I'm inclined to set this to handle as large a DB footprint as will be
in the coming year or two, so maybe 3X what it is now. What is the
impact/cost of setting max_fsm_pages at, say, 3M for an 8GB footprint?
(3 x 8GB/8K)


Ok, so I see 40B per, so setting it to 3M ==> 3M * 40 = 120MB of additional
RAM usage for this? Any other impacts with which to be concerned?
---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Nov 23 '05 #12
Jeff Boes <jb***@nexcerpt.com> writes:
At some point, someone was going to write a "white paper" detailing how one
might go about setting these parameters.


In 7.4, it's relatively easy to check on whether your settings are
reasonable: just do a VACUUM VERBOSE (database-wide) and check the
FSM requirements indicated at the end of the tediously chatty output.

All I have handy to illustrate with is a test server that has only the
regression test database loaded in it, so these numbers are very small,
but what I see is:

INFO: free space map: 280 relations, 520 pages stored; 4720 total pages needed
DETAIL: Allocated FSM size: 1000 relations + 20000 pages = 178 kB shared memory.

This says that what I actually need to keep track of the present free
space in the database is 280 FSM relation slots and 4720 FSM page slots.
So the allocated space is plenty comfy here. If the "pages needed"
number is significantly larger than your max_fsm_pages setting, then you
have a problem.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to ma*******@postgresql.org)

Nov 23 '05 #13
On Monday May 10 2004 11:37, Ed L. wrote:
On Friday May 7 2004 12:48, Tom Lane wrote:
"Ed L." <pg***@bluepolka.net> writes:
2) Would this low setting of 10000 explain the behavior we saw of
seqscans of a perfectly analyzed table with 1000 rows requiring
ridiculous amounts of time even after we cutoff the I/O load?


Possibly. The undersized setting would cause leakage of disk space
(that is, new rows get appended to the end of the table even when space
is available within the table, because the system has "forgotten" about
that space due to lack of FSM slots to remember it in). If the
physical size of the table file gets large enough, seqscans will take a
long time no matter how few live rows there are. I don't recall now
whether your VACUUM VERBOSE results showed that the physical table size
(number of pages) was out of proportion to the actual number of live
rows. But it sure sounds like that might have been the problem.


If it were indeed the case that we'd leaked a lot of diskspace, then
after bumping max_fsm_pages up to a much higher number (4M), will these
pages gradually be "remembered" as they are accessed by autovac and or
queried, etc? Or is a dump/reload or 'vacuum full' the only way? Trying
to avoid downtime...


I mean, I see that our VACUUM (not full) does appear to be truncating and
reducing the number of pages in some cases. Is that possible? If so, just
thinking a DB restart will be much less complicated than dropping/reloading
the individual table. VACUUM FULL has always been way too slow for our
purposes, not sure why.

TIA.

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Nov 23 '05 #14
On Friday May 7 2004 12:48, Tom Lane wrote:
"Ed L." <pg***@bluepolka.net> writes:
2) Would this low setting of 10000 explain the behavior we saw of
seqscans of a perfectly analyzed table with 1000 rows requiring
ridiculous amounts of time even after we cutoff the I/O load?


Possibly. The undersized setting would cause leakage of disk space
(that is, new rows get appended to the end of the table even when space
is available within the table, because the system has "forgotten" about
that space due to lack of FSM slots to remember it in). If the physical
size of the table file gets large enough, seqscans will take a long time
no matter how few live rows there are. I don't recall now whether your
VACUUM VERBOSE results showed that the physical table size (number of
pages) was out of proportion to the actual number of live rows. But it
sure sounds like that might have been the problem.


If it were indeed the case that we'd leaked a lot of diskspace, then after
bumping max_fsm_pages up to a much higher number (4M), will these pages
gradually be "remembered" as they are accessed by autovac and or queried,
etc? Or is a dump/reload or 'vacuum full' the only way? Trying to avoid
downtime...
---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Nov 23 '05 #15
"Ed L." <pg***@bluepolka.net> writes:
If it were indeed the case that we'd leaked a lot of diskspace, then after
bumping max_fsm_pages up to a much higher number (4M), will these pages
gradually be "remembered" as they are accessed by autovac and or queried,
etc? Or is a dump/reload or 'vacuum full' the only way? Trying to avoid
downtime...


The next vacuum will add the "leaked" space back into the FSM, once
there's space there to remember it. You don't need to do anything
drastic, unless you observe that the amount of wasted space is so large
that a vacuum full is needed.

BTW, these days, a CLUSTER is a good alternative to a VACUUM FULL; it's
likely to be faster if the VACUUM would involve moving most of the live
data anyway.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Nov 23 '05 #16

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
by: | last post by:
I would like to get people's views on the best way to implement verbosity (or any command-line option) into python scripts. The three styles I'm wrestling with are: def func(): #do stuff if...
6
by: Holger Marzen | last post by:
Hi all, the docs are not clear for me. If I want (in version 7.1.x, 7.2.x) to help the analyzer AND free unused space do I have to do a vacuum vacuum analyze or is a
1
by: Dmitry Tkach | last post by:
Hi, everybody! I am getting a weird failure, trying to vacuum a table in 7.3 - it says "ERROR: Index pg_toast_89407_index is not a btree". Does it ring a bell to anyone? Any ideas what's wrong?...
10
by: Stephen | last post by:
Hello, Is it normal for plain VACUUM on large table to degrade performance by over 9 times? My database becomes unusable when VACUUM runs. From reading newsgroups, I thought VACUUM should only...
8
by: Sean Shanny | last post by:
To all, The facts: PostgreSQL 7.4.0 running on BSD 5.1 on Dell 2650 with 4GB RAM, 5 SCSI drives in hardware RAID 0 configuration. Database size with indexes is currently 122GB. DB size...
2
by: lnd | last post by:
Any comments on multi-versioning problem: As far as I understand from PG documentation, *CURRENTLY* VACUUM must be run regulary, otherwise: -Q. database will grow as fast as there are many DML...
6
by: Alex | last post by:
Hi, just a few questions on the Vaccum I run a vacuum analyze on the database every night as part of a maintenance job. During the day I have a job that loads 30-70,000 records into two...
4
by: Ed L. | last post by:
I am trying to better understand diskspace leakage and the relationship to vacuum, max_fsm_pages, and max_fsm_relations. Below are 3 snippets from 3 successive vacuums on a table with ~284K rows...
2
by: Russell Smith | last post by:
Postgresql 7.4.5 The following VACUUMs were run within a couple of minutes of each other completing. This table concerned is a table that has not been changed in a long time. (a month) There...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
0
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: Shællîpôpï 09 | last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
0
by: af34tf | last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.