Planner choice & tuning - PostgreSQL Database

Ed L.

The planner is choosing a sequential scan for my query. I am trying
to understand why since it is clearly not the fastest choice, and
what the proper tuning dial is to adjust here. Here's the query:

select count(1) from measurement where measure_id = 98;

TIA. Here are the details:

% psql -c "\d measurement"
Table "public.measurement"
Column | Type | Modifiers
------------+--------------------------+-------------------------------------------------------------
id | integer | not null default nextval('public.measurement_id_seq'::text)
host_id | integer | not null
measure_id | integer | not null
value | double precision | not null
duration | double precision | not null default 0.0
moment | timestamp with time zone | not null default ('now'::text)::timestamp(6) with time zone
source_id | integer | not null default 1
Indexes: measurement_duration_idx btree (duration),
measurement_host_idx btree (host_id),
measurement_measure_idx btree (measure_id),
measurement_moment_idx btree (moment),
measurement_source_idx btree (source_id),
measurement_value_idx btree (value)
Foreign Key constraints: $2 FOREIGN KEY (measure_id) REFERENCES measure(id) ON UPDATE NO ACTION ON DELETE NO ACTION,
$1 FOREIGN KEY (source_id) REFERENCES measurement_source(id) ON UPDATE NO ACTION ON DELETE NO ACTION

% psql -c "analyze measurement"
ANALYZE

% psql -c "select count(1) from measurement"
count
---------
1800866
(1 row)
% psql -c "select count(1) from measurement where measure_id = 98"
count
-------
38862
(1 row)

% time psql -c "set enable_seqscan=no; explain analyze select count(1) from measurement where measure_id = 98"
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=215208.66..215208.66 rows=1 width=0) (actual time=904.58..904.58 rows=1 loops=1)
-> Index Scan using measurement_measure_idx on measurement (cost=0.00..215062.64 rows=58408 width=0) (actual time=0.17..843.78 rows=38866 loops=1)
Index Cond: (measure_id = 98)
Total runtime: 904.77 msec
(4 rows)
real 0m1.298s
user 0m0.010s
sys 0m0.000s

% time psql -c "explain analyze select count(1) from measurement where measure_id = 98"
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=97857.09..97857.09 rows=1 width=0) (actual time=2493.90..2493.90 rows=1 loops=1)
-> Seq Scan on measurement (cost=0.00..97711.07 rows=58408 width=0) (actual time=12.94..2430.08 rows=38866 loops=1)
Filter: (measure_id = 98)
Total runtime: 2494.11 msec
(4 rows)
real 0m2.885s
user 0m0.000s
sys 0m0.000s
This seems to be saying the planner thinks its less expensive to do the
sequential scan, but why?

Including pg_stats data in case it is relevant here.

% psql -c "select * from pg_stats where tablename = 'measurement' and attname = 'measure_id'"
schemaname | tablename | attname | null_frac | avg_width | n_distinct | most_common_vals | most_common_freqs | histogram_bounds | correlation
------------+-------------+------------+-----------+-----------+------------+-------------------------------------------+----------------------------------------------------------------------------------+-----------------------------------------+-------------
public | measurement | measure_id | 0 | 4 | 52 | {104,108,113,124,106,110,101,107,112,109} | {0.084,0.0556667,0.052,0.05,0.048,0.0473333,0.0383 333,0.0363333,0.034,0.0326667} | {23,36,39,43,85,89,100,111,120,122,128} | 0.232087
(1 row)

% psql -c "select name, setting from pg_settings where name like 'random%'"
name | setting
------------------+---------
random_page_cost | 4
(1 row)

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to ma*******@postgresql.org)

Nov 23 '05 #1

Subscribe Post Reply

1589

Scott Marlowe

On Tue, 2004-06-22 at 16:47, Ed L. wrote:

The planner is choosing a sequential scan for my query. I am trying
to understand why since it is clearly not the fastest choice, and
what the proper tuning dial is to adjust here. Here's the query:

select count(1) from measurement where measure_id = 98;

TIA. Here are the details:

% psql -c "\d measurement"
Table "public.measurement"
Column | Type | Modifiers
------------+--------------------------+-------------------------------------------------------------
id | integer | not null default nextval('public.measurement_id_seq'::text)
host_id | integer | not null
measure_id | integer | not null
value | double precision | not null
duration | double precision | not null default 0.0
moment | timestamp with time zone | not null default ('now'::text)::timestamp(6) with time zone
source_id | integer | not null default 1
Indexes: measurement_duration_idx btree (duration),
measurement_host_idx btree (host_id),
measurement_measure_idx btree (measure_id),
measurement_moment_idx btree (moment),
measurement_source_idx btree (source_id),
measurement_value_idx btree (value)
Foreign Key constraints: $2 FOREIGN KEY (measure_id) REFERENCES measure(id) ON UPDATE NO ACTION ON DELETE NO ACTION,
$1 FOREIGN KEY (source_id) REFERENCES measurement_source(id) ON UPDATE NO ACTION ON DELETE NO ACTION

% psql -c "analyze measurement"
ANALYZE

% psql -c "select count(1) from measurement"
count
---------
1800866
(1 row)
% psql -c "select count(1) from measurement where measure_id = 98"
count
-------
38862
(1 row)

% time psql -c "set enable_seqscan=no; explain analyze select count(1) from measurement where measure_id = 98"
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=215208.66..215208.66 rows=1 width=0) (actual time=904.58..904.58 rows=1 loops=1)
-> Index Scan using measurement_measure_idx on measurement (cost=0.00..215062.64 rows=58408 width=0) (actual time=0.17..843.78 rows=38866 loops=1)
Index Cond: (measure_id = 98)
Total runtime: 904.77 msec
(4 rows)
real 0m1.298s
user 0m0.010s
sys 0m0.000s

% time psql -c "explain analyze select count(1) from measurement where measure_id = 98"
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=97857.09..97857.09 rows=1 width=0) (actual time=2493.90..2493.90 rows=1 loops=1)
-> Seq Scan on measurement (cost=0.00..97711.07 rows=58408 width=0) (actual time=12.94..2430.08 rows=38866 loops=1)
Filter: (measure_id = 98)
Total runtime: 2494.11 msec
(4 rows)
real 0m2.885s
user 0m0.000s
sys 0m0.000s
This seems to be saying the planner thinks its less expensive to do the
sequential scan, but why?

Including pg_stats data in case it is relevant here.

% psql -c "select * from pg_stats where tablename = 'measurement' and attname = 'measure_id'"
schemaname | tablename | attname | null_frac | avg_width | n_distinct | most_common_vals | most_common_freqs | histogram_bounds | correlation
------------+-------------+------------+-----------+-----------+------------+-------------------------------------------+----------------------------------------------------------------------------------+-----------------------------------------+-------------
public | measurement | measure_id | 0 | 4 | 52 | {104,108,113,124,106,110,101,107,112,109} | {0.084,0.0556667,0.052,0.05,0.048,0.0473333,0.0383 333,0.0363333,0.034,0.0326667} | {23,36,39,43,85,89,100,111,120,122,128} | 0.232087
(1 row)

% psql -c "select name, setting from pg_settings where name like 'random%'"
name | setting
------------------+---------
random_page_cost | 4
(1 row)

I'd say your random_page_cost is too low for your setup. While there's
a slight misguess on the planner in the number of rows (38k verus 58k)
it's not that big. Try values between 1.2 and 2.0. Most larger servers
with plenty of memory work well around 1.3 to 1.5. Also, make sure your
effective_cache_size is properly set.
---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Nov 23 '05 #2

Tom Lane

"Ed L." <pg***@bluepolka.net> writes:

This seems to be saying the planner thinks its less expensive to do the
sequential scan, but why?

Because it thinks it's less expensive ;-)

There are a couple issues here: one is overestimation of the number of
rows matching the query. That is a statistical issue and should be
fixable by increasing the statistics target for the column. With an
accurate rows estimate the seqscan cost estimate would not change but
the indexscan cost estimate would decrease approximately proportionally.

Allowing for the row estimation error, the indexscan cost estimate is
still about 4x what it ought to be, which means that you could bring the
estimated cost (for this query anyway) in line with reality by setting
random_page_cost = 1.

Note however that we are defining "reality" as "the results of this one
single experiment". You should realize in particular that the table is
probably fully cached in memory during your test, which is a scenario
in which random_page_cost actually is 1.0 (given reasonable assumptions
about the behavior of the kernel's cache, anyway). If you optimize for
this case then you are likely to be pessimizing the behavior for larger
tables that don't fit in memory.

My suggestion would be to lower random_page_cost to 3 or so, which would
be enough to tip the decision to indexscan for this case, though not to
make the estimated cost really truly correct. You should however run a
larger set of experiments before doing anything, and realize that any
random_page_cost setting is a compromise because the model doesn't take
all the variables into account.

You can find much more about this issue in the pgsql-performance archives.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to ma*******@postgresql.org

Nov 23 '05 #3

Similar topics

planner/watcher web service?

by: Michele | last post by:

Hi all, There's a web-based project I'm working on, which my boss insists should be implemented as a java web service: I know Java, but I'm a complete newcomer to web services, so I would badly...

Java

Index Tuning Wizard - headache

by: Maryam | last post by:

Hi, I am having problems getting anything useful out of the index tuning wizard. I have created a table and inserted data into it. When i run the index tuning wizard i expect 2 indexes to be...

Microsoft SQL Server

Planner question regarding functions

by: Chris Travers | last post by:

Hi all; I suspect I know what the answer here is, but I thought I would ask anyway. How transparent are SQL language functions from the planner's perspective? For example if I create a table:...

PostgreSQL Database

Direct reads & In memory Database

by: Alex Greem | last post by:

Dear all, Our database (DB2 Workgroup 7.2 FP12) is constantly under heavy load. Most time CPU usage (1 Pentium3 1Ghz) is more 50% busy. We have 3GB RAM memory Our normal workload is 200-300...

DB2 Database

Planner configuration

by: Martin Marques | last post by:

I was reading the section about Planner Method Configuration, and I just don't get why all the options have this message: "This is used for debugging the query planner". Are they all debugging...

PostgreSQL Database

Tuning on Oracle 10g and SQL Server 2005

by: mleal | last post by:

Does anyone have some more detailed information about how Oracle and MS implement / allow Tuning on Oracle 10g and SQL Server 2005 and the differences between them? Which of them, In a deep...

Microsoft SQL Server

Performance tuning

by: atlaste | last post by:

Hi, I'm currently developing an application that uses a lot of computational power, disk access and memory caching (to be more exact: an information retrieval platform). In these kind of...

C# / C Sharp

Self tuning bufferpool

by: dunleav1 | last post by:

In 9.1 and 9.5 (Linux 64 bit) when a buffer pool is set to self- tuning, how are blocks configured in respect to blocked vs non-blocked when self-tuning is set to on? (ie) I have one bufferpool...

DB2 Database

Advs & disadvs of Self Tuning Memory Management feature of DB2 v9.5

by: satish mullapudi | last post by:

Hi All, We are using DB2 v8.2 and are planning to migrate to DB2 v9.5. The settings of our servers are as follows: OS: WIN (4 servers) , AIX(1 server) RAM: 3GB HDD: 40GB Other huge s/ws:...

DB2 Database

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

Career Advice