473,836 Members | 1,520 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

DB cache size strategies


Here's some of my current notions on pgsql performance tuning strictly as it
relates to pgsql tuning parameters in the context of a dedicated linux or
hpux server. I'm particularly focusing on the shared_buffers setting. I
invite any corrective or confirming feedback. I realize there are many
other hugely important performance factors outside this scope.

One key aspect of pgsql performance tuning is to adjust the memory
consumption settings (shared_buffers , sort_mem, etc) large enough to hold
as much of the database in shared memory as possible while not causing page
swap-ins. I understand that both page swap-outs and swap space usage is
normal and OK, but lots of page swap-ins are bad). In other words, for
absolute fastest performance, we want a database buffer cache hit rate of
as close to 100% as possible.

Now, how do we accurately measure this cache hit rate at any one point?
Well, here's what I currently know as the best measure for a given cluster
of databases:

SELECT SUM(pg_stat_get _db_blocks_hit( d.oid))
/ SUM(pg_stat_get _db_blocks_fetc hed(d.oid)) AS hit_rate
FROM pg_database d;

How do we identify when we have sub-optimal pgsql configuration parameters
in this regard? My understanding: to whatever extent the hit rate is
below 100%, our shared buffers are not large enough to hold the data we're
accessing.

How do we fix the problem? We want to increase the shared_buffers setting
and/or maybe some combination sort_mem, wal_buffers, vacuum_mem, reset our
stats, and resume monitoring. I assume which of these we increase for
maximum benefit depends on the sources of the cache overflow (routine
queries, sorting, vacuuming), but that shared_buffers is the most impactful
in practice. Again, we want to do this without causing page swap-ins.
When you see swap-ins, you've gone too far. If we experience swap-ins and
less than 100% cache hit rate by any significant amount, then we need more
RAM or less RAM consumption.

The end result is that page swap-ins (sar -W) and cache hit rate (query
above) are two very key measures to use as warning bells.

Glad to hear your thoughts.

Ed
---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddres sHere" to ma*******@postg resql.org)

Nov 22 '05
26 6286
On Tue, 2004-02-10 at 16:17, scott.marlowe wrote:
On Tue, 10 Feb 2004, Ed L. wrote:
On Tuesday February 10 2004 3:48, scott.marlowe wrote:
On Tue, 10 Feb 2004, Ed L. wrote:
> Interesting. Why leave very large tables to the kernel instead of the
> db cache? Assuming a dedicated DB server and a DB smaller than
> available RAM, why not give the DB enough RAM to get the entire DB into
> the DB cache? (Assuming you have the RAM).

Because the kernel is more efficient (right now) at caching large data
sets.

With the ARC cache manager that will likely wend it's way into 7.5, it's
quite a likely possibility that postgresql will be able to efficiently
handle a larger cache, but it will still be a shared memory cache, and
those are still usually much slower than the kernel's cache.


Hmmm. Others have asserted/assumed they'd be roughly equivalent. It'd be
interesting to see some real data measuring the performance of the shared
mem cache vs. kernel cache. Anyone know of existing benchmarks?


Testing from various folks has shown that there is a maximum size of
somewhere around 256 meg or so where the returns diminish pretty quickly,
unless your working data sets are larger than that. I.e. setting a
database up that needs 20 megs to throw data around in with 200 megs of
buffers is usually slower than handing it 32 megs of buffer and letting
the kernel do it.

But ARC may change all of that. I'll have to grab a CVS snapshot and see
how it behaves with large datasets and shared buffers...

The latest osdl-dbt3 run on our stp system used about 250M for shared
buffer. The database was about 3.8G. The system has 8G of RAM, most of
which are used by the database. I did try to increase the shared_buffer
but the performance decreased. For this workload, the bottleneck is the
I/O for the pgsql_tmp directory, since those are complicated queries
with group by and sorting.

The result is at:
http://khack.osdl.org/stp/287555/

It would be interesting to see how 7.5 handles this workload. Also you
are welcome to use STP to try out different database parameters with
this workload. Let me know if you are interested, I will follow up with
more instructions.

Jenny
---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Nov 22 '05 #11
"Ed L." <pg***@bluepolk a.net> writes:
On Tuesday February 10 2004 1:42, Martijn van Oosterhout wrote:
I think giving postgresql half your memory is couterproductiv e.
I get the sense it really depends on the nature of the apps running on the
box.


Giving PG half the RAM is counterproducti ve no matter what --- that
pretty much guarantees that every page that's in RAM will be in RAM
twice, once in PG buffers and once in kernel buffers. The two
reasonable schools of thought are (1) to make PG buffers relatively
small and let the kernel do the bulk of the buffering, or (2) to give
PG most of the RAM and expect it to do the bulk of the buffering.

Past experience has been that theory (1) generally wins. Jan's recent
work on ARC buffering may make things more interesting, though.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Nov 22 '05 #12
Martha Stewart called it a Good Thing when pg***@bluepolka .net ("Ed L.") wrote:
On Tuesday February 10 2004 3:48, scott.marlowe wrote:
On Tue, 10 Feb 2004, Ed L. wrote:
> Interesting. Why leave very large tables to the kernel instead
> of the db cache? Assuming a dedicated DB server and a DB smaller
> than available RAM, why not give the DB enough RAM to get the
> entire DB into the DB cache? (Assuming you have the RAM).


Because the kernel is more efficient (right now) at caching large
data sets.

With the ARC cache manager that will likely wend it's way into 7.5,
it's quite a likely possibility that postgresql will be able to
efficiently handle a larger cache, but it will still be a shared
memory cache, and those are still usually much slower than the
kernel's cache.


Hmmm. Others have asserted/assumed they'd be roughly equivalent.
It'd be interesting to see some real data measuring the performance
of the shared mem cache vs. kernel cache. Anyone know of existing
benchmarks?


The problem with the shared memory cache is that there seems to be
some variation from OS to OS as to how efficiently the system can
manage shared memory as the amount of it grows.

The same is commonly true for various sorts of resources:

- On Linux, for a long time, you didn't want to have a lot of files in
one directory as it did a linked-list scan to look for files.
ReiserFS specifically planned an improvement to this by replacing
the linked list with a b-tree.

- Linux and Solaris are conspicuously different in their process
scheduling models. With Solaris, performance suffers if you have
a great deal of processes; the architecture encourages use of
threading to get good performance. In contrast, Linux 2.6 has
what they call an "O(1)" process scheduler where having lots of
processes does not substantially slow down OS performance.

The benchmarks have been a bit limited thus far; people haven't found
that having much more than 10000 buffer pages to be particularly
helpful. And there has been some indication that sizes larger than
that can slow things down. And that may not be a constant across OS
implementations ...
--
"cbbrowne","@", "cbbrowne.c om"
http://www3.sympatico.ca/cbbrowne/rdbms.html
Rules of the Evil Overlord #151. "I will not set myself up as a
god. That perilous position is reserved for my trusted lieutenant."
<http://www.eviloverlor d.com/>
Nov 22 '05 #13
On Tuesday February 10 2004 10:26, Tom Lane wrote:

Giving PG half the RAM is counterproducti ve no matter what --- that
pretty much guarantees that every page that's in RAM will be in RAM
twice, once in PG buffers and once in kernel buffers. The two
reasonable schools of thought are (1) to make PG buffers relatively
small and let the kernel do the bulk of the buffering, or (2) to give
PG most of the RAM and expect it to do the bulk of the buffering.

Past experience has been that theory (1) generally wins. Jan's recent
work on ARC buffering may make things more interesting, though.


So, pursuing theory (1) in 'the time before ARC', assuming you have a
dedicated box with little or no non-db competition for resources, why give
PG anymore than is absolutely required (max_connection s*2 or 16)? Why not
just let the kernel reign as completely as possible? Is that what you mean
by "relatively small"?

TIA.
---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Nov 22 '05 #14
"Ed L." <pg***@bluepolk a.net> writes:
So, pursuing theory (1) in 'the time before ARC', assuming you have a
dedicated box with little or no non-db competition for resources, why give
PG anymore than is absolutely required (max_connection s*2 or 16)? Why not
just let the kernel reign as completely as possible? Is that what you mean
by "relatively small"?


Well, if you go *really* small then you find a lot of CPU time gets
wasted shuffling data from kernel cache to PG cache. The sweet spot
for theory (1) seems to be to set shared_buffers in the range of 1000 to
10000 buffers. (Time was that that was a serious amount of RAM, but
not any more...)

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Nov 22 '05 #15
I make some test on my production system while maintance time so no user use the system.It run on old box with k6-2/500 and 3* 128mb DIMMs pg 7.3.2
I have table with around 100 000 lines like this : owner int8,id int8, content text , index on column owner (this index is used by text query).

i manage a test query select * from table where owner='10000' order by id oddset 100 limit 50 . There is around 10 000 lines before limit and offset apply. i run explain analyze and got this

Case 1: (no memory to Pg )
sort_mem = 64Kb
effective_cache _size = 64 (8k blocks - Around 512 kb )

Take 3300 ms
Case 2: (give a lot of ram to postgres)
sort_mem = 64000 Kb
effective_cache _size = 8192 (8k blocks)

Swapping occured , execution take 5400 ms
Case 3:
sort_mem = 16000 Kb
effective_cache _size = 512 (8k blocks - Around 4 Mb )

Take 1200 ms. The best result.
Case 4:
sort_mem = 16000 Kb
effective_cache _size = 64 (8k blocks - Around 512 kb )

Take 1900 ms.. In this case i try to increase effective_cache _size step by step 64,128,256,512, 1024 but increase effective_cache _size up from 512 have no dramatic impact on performance.
I test shared_buffer settings, between 2*max_connectio n ie 128*2 to 1024 but have no dramatic impact on performance, but in my tests no focus on this
It seems that :

1: Settings memory limits too high, whnen machine start using swap space is WROSE then give postgres as low memory as possible.

2: settings of sort_mem have bigger impact on performance then settings of effective_cache _size , if db cache hold at least hunderds of disk pages. More than 1000 disk pages in effective cache size have no sense.

I can not made reliable simulations with this database on real user load :(, so with more concurent user result may be different. May be a good "stress test" suite for postgresql database is needed for future testings.


----- P┼»vodn├* zpr├íva -----
Od: "Ed L." <pg***@bluepolk a.net>
Komu: "Tom Lane" <tg*@sss.pgh.pa .us>
Kopie: "Martijn van Oosterhout" <kl*****@svana. org>; <pg***********@ postgresql.org>
Odesláno: 11. února 2004 7:01
P┼Öedm─Ťt: Re: [GENERAL] DB cache size strategies

On Tuesday February 10 2004 10:26, Tom Lane wrote:

Giving PG half the RAM is counterproducti ve no matter what --- that
pretty much guarantees that every page that's in RAM will be in RAM
twice, once in PG buffers and once in kernel buffers. The two
reasonable schools of thought are (1) to make PG buffers relatively
small and let the kernel do the bulk of the buffering, or (2) to give
PG most of the RAM and expect it to do the bulk of the buffering.

Past experience has been that theory (1) generally wins. Jan's recent
work on ARC buffering may make things more interesting, though.


So, pursuing theory (1) in 'the time before ARC', assuming you have a
dedicated box with little or no non-db competition for resources, why give
PG anymore than is absolutely required (max_connection s*2 or 16)? Why not
just let the kernel reign as completely as possible? Is that what you mean
by "relatively small"?

TIA.
---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org


---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Nov 22 '05 #16
On Wednesday 11 February 2004 09:42, NTPT wrote:

It seems that :

1: Settings memory limits too high, whnen machine start using swap space
is WROSE then give postgres as low memory as possible.
Swapping is always bad news.
2: settings of sort_mem have bigger impact on performance then settings
of effective_cache _size , if db cache hold at least hunderds of disk
pages. More than 1000 disk pages in effective cache size have no sense.

I can not made reliable simulations with this database on real user load
:(, so with more concurent user result may be different. May be a good
"stress test" suite for postgresql database is needed for future testings.


It will be radically different with concurrent users. Your large sort_mem will
apply to every sort they do and so you will probably end up going into swap
again.

This is one of the reasons why performance tuning is complicated, and you need
to test a real-world workload wherever possible.
--
Richard Huxton
Archonet Ltd

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Nov 22 '05 #17
On Wed, Feb 11, 2004 at 12:26:06AM -0500, Tom Lane wrote:
Giving PG half the RAM is counterproducti ve no matter what --- that
pretty much guarantees that every page that's in RAM will be in RAM
twice, once in PG buffers and once in kernel buffers. The two


Well, unless you're using an OS which allows you to set a nocache
option for a filesystem (Solaris allows this on UFS, for instance).
But the kernel cache, in my experience, is _still_ the winner,
because the LRU database cache isn't real efficient at large sizes.

--
Andrew Sullivan

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Nov 22 '05 #18
On Wed, 11 Feb 2004, NTPT wrote:
Take 1900 ms.. In this case i try to increase effective_cache _size step
by step 64,128,256,512, 1024 but increase effective_cache _size up from
512 have no dramatic impact on performance.
Note that effective_cache _size ONLY affects the query plan chosen. It has
no effect on the actual size of the kernel or postgresql caches. It is
just a hint to the planner about how large the kernel cache is.
I test shared_buffer settings, between 2*max_connectio n ie 128*2 to
1024 but have no dramatic impact on performance, but in my tests no
focus on this


If the data you're tossing around is ~1 Megabyte, then having more than 1
megabyte in shared buffers won't help. Basically, if shared buffers is
smaller than the data you're throwing about, then increasing it tends to
help, if it's larger than the amount of data being pulled / joined / etc..
then it doesn't help to make it larger.
---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Nov 22 '05 #19
On Tuesday February 10 2004 11:17, Tom Lane wrote:

Well, if you go *really* small then you find a lot of CPU time gets
wasted shuffling data from kernel cache to PG cache. The sweet spot
for theory (1) seems to be to set shared_buffers in the range of 1000 to
10000 buffers. (Time was that that was a serious amount of RAM, but
not any more...)


In theory (1), any consensus on what criteria one ought to use when deciding
between 1000 and 10000?

In general, would it be true to say that if one does *not* anticipate
contention for kernel disk cache space from non-DB processes (e.g., the
dedicated db server), then you probably want to use theory (1)? If one
*does* anticipate such contention (e.g. the all-on-one-box web-db app),
then you probably want to use theory (2) in order to ensure an adequate
cache?

Ed

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddres sHere" to ma*******@postg resql.org)

Nov 22 '05 #20

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
5608
by: xixi | last post by:
i am using db2 udb v8.1 on win 64 bit, i found this on my db2diag.log 2003-11-12-13.20.05.550001 Instance:DB2 Node:000 PID:1840(db2syscs.exe) TID:3068 Appid:AC100453.G761.00F8D15749BE access plan manager sqlra_cache_mem_please Probe:21 Database:NJIPD Package Cache Number Overflows 0x000006FAFFBF141C : 0x00000001 .... 2003-11-12-13.20.05.780000 Instance:DB2 Node:000
2
4216
by: xixi | last post by:
we are using db2 udb v8.1 on win64bit 2003 server, i found this in db2diag.log, our application used jdbc for connection to this remote server, why the overflow number keep increasing, what should i do with this? our maxappls=60, pckcachesz = 859 2004-05-05-14.53.11.720000 Instance:DB2 Node:000 PID:2232(db2syscs.exe) TID:464 Appid:AC1004ED.G890.00FC56AF3D38 access plan manager sqlra_cache_mem_please Probe:21 Database:NJIPD
1
2590
by: aj | last post by:
DB2 v8 FP5 Red Hat AS/EL 2.1 I found the following in my db2diag.log: 2004-09-07-10.01.37.747898 Instance:oltp Node:000 PID:14637(db2agent (NDRDB)) TID:8192 Appid:C0455584.K508.00E507141135 access plan manager sqlra_cache_mem_please Probe:21 Database:NDRDB Package Cache Number Overflows
10
4861
by: datapro01 | last post by:
Running DB2 8.1.6A on AIX 5.1 We are experience package cache overflows. The high water mark for package cache is showing as 16,108,513 bytes, or approximately 3933 4K pages. The package cache size is set at Maxapples * 8 Maxapples is at 500.
17
4838
by: emerth | last post by:
Hello all: I have read references to optimizing C code to exploit the CPU cache of <insert your favourite CPU>. Can anyone point me towards some info on the 'net that gets into techniques for this kind of thing in C (or C++) in some depth? I imagine this can get specific to the CPU in question.
6
1778
by: Adam | last post by:
On an xp machine, the caching works as expected. I have deployed to a win2k server, and an item I add to the cache expires almost immediately some times and in under a minute in other times. The aspnet_wp process is not restarting during this time, and there are no other requests, so nothing should be clearing this out. I have added a CacheItemRemovedCallback which writes debug info out to a DefaultTraceListener, but no output shows up on...
6
2526
by: ShaveDave27 | last post by:
Hi, I'm trying to create a cache which will use direct mapping and output a miss rate for a list of integers. I'm struggling quite a bit. I think I have created the cache OK, probably not though - heres the code. import java.util.*; import java.util.LinkedHashMap; public class Cache<A,B> {
3
5733
by: aj | last post by:
DB2 LUW 8.1 fixpak 14 Red Hat EL AS 4.4 I'm trying to diagnose some nocturnal CPU pressure, and am trying to understand the dynamic statement cache as it applies to LUW. The only doc/redbooks I am finding are for Z/OS, which I am completely ignorant of. I am using only Java and JDBC in my applications. No static SQL.
10
1868
by: =?Utf-8?B?TWFyaw==?= | last post by:
Hi... We've been trying to migrate our asp.net apps off older, underpowered hardware to newer, bigger boxes but when we do, we see our databases start to melt. When I started to look into it, I found that the older boxes all had bigger EffectivePrivateBytesLimit values than the newer boxes, which seems very counter-intuitive to me. And it seemed to me that a much smaller Cache would be pushing more requests back to the databases.
0
9670
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10844
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
10548
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
0
10254
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
9374
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development projectŚplanning, coding, testing, and deploymentŚwithout human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
5649
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
5826
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
2
4016
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
3115
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.