473,702 Members | 2,767 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Planner choice & tuning


The planner is choosing a sequential scan for my query. I am trying
to understand why since it is clearly not the fastest choice, and
what the proper tuning dial is to adjust here. Here's the query:

select count(1) from measurement where measure_id = 98;

TIA. Here are the details:

% psql -c "\d measurement"
Table "public.measure ment"
Column | Type | Modifiers
------------+--------------------------+-------------------------------------------------------------
id | integer | not null default nextval('public .measurement_id _seq'::text)
host_id | integer | not null
measure_id | integer | not null
value | double precision | not null
duration | double precision | not null default 0.0
moment | timestamp with time zone | not null default ('now'::text):: timestamp(6) with time zone
source_id | integer | not null default 1
Indexes: measurement_dur ation_idx btree (duration),
measurement_hos t_idx btree (host_id),
measurement_mea sure_idx btree (measure_id),
measurement_mom ent_idx btree (moment),
measurement_sou rce_idx btree (source_id),
measurement_val ue_idx btree (value)
Foreign Key constraints: $2 FOREIGN KEY (measure_id) REFERENCES measure(id) ON UPDATE NO ACTION ON DELETE NO ACTION,
$1 FOREIGN KEY (source_id) REFERENCES measurement_sou rce(id) ON UPDATE NO ACTION ON DELETE NO ACTION

% psql -c "analyze measurement"
ANALYZE

% psql -c "select count(1) from measurement"
count
---------
1800866
(1 row)
% psql -c "select count(1) from measurement where measure_id = 98"
count
-------
38862
(1 row)

% time psql -c "set enable_seqscan= no; explain analyze select count(1) from measurement where measure_id = 98"
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=215208.66 ..215208.66 rows=1 width=0) (actual time=904.58..90 4.58 rows=1 loops=1)
-> Index Scan using measurement_mea sure_idx on measurement (cost=0.00..215 062.64 rows=58408 width=0) (actual time=0.17..843. 78 rows=38866 loops=1)
Index Cond: (measure_id = 98)
Total runtime: 904.77 msec
(4 rows)
real 0m1.298s
user 0m0.010s
sys 0m0.000s

% time psql -c "explain analyze select count(1) from measurement where measure_id = 98"
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=97857.09. .97857.09 rows=1 width=0) (actual time=2493.90..2 493.90 rows=1 loops=1)
-> Seq Scan on measurement (cost=0.00..977 11.07 rows=58408 width=0) (actual time=12.94..243 0.08 rows=38866 loops=1)
Filter: (measure_id = 98)
Total runtime: 2494.11 msec
(4 rows)
real 0m2.885s
user 0m0.000s
sys 0m0.000s
This seems to be saying the planner thinks its less expensive to do the
sequential scan, but why?

Including pg_stats data in case it is relevant here.

% psql -c "select * from pg_stats where tablename = 'measurement' and attname = 'measure_id'"
schemaname | tablename | attname | null_frac | avg_width | n_distinct | most_common_val s | most_common_fre qs | histogram_bound s | correlation
------------+-------------+------------+-----------+-----------+------------+-------------------------------------------+----------------------------------------------------------------------------------+-----------------------------------------+-------------
public | measurement | measure_id | 0 | 4 | 52 | {104,108,113,12 4,106,110,101,1 07,112,109} | {0.084,0.055666 7,0.052,0.05,0. 048,0.0473333,0 .0383333,0.0363 333,0.034,0.032 6667} | {23,36,39,43,85 ,89,100,111,120 ,122,128} | 0.232087
(1 row)

% psql -c "select name, setting from pg_settings where name like 'random%'"
name | setting
------------------+---------
random_page_cos t | 4
(1 row)


---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddres sHere" to ma*******@postg resql.org)

Nov 23 '05 #1
2 1616
On Tue, 2004-06-22 at 16:47, Ed L. wrote:
The planner is choosing a sequential scan for my query. I am trying
to understand why since it is clearly not the fastest choice, and
what the proper tuning dial is to adjust here. Here's the query:

select count(1) from measurement where measure_id = 98;

TIA. Here are the details:

% psql -c "\d measurement"
Table "public.measure ment"
Column | Type | Modifiers
------------+--------------------------+-------------------------------------------------------------
id | integer | not null default nextval('public .measurement_id _seq'::text)
host_id | integer | not null
measure_id | integer | not null
value | double precision | not null
duration | double precision | not null default 0.0
moment | timestamp with time zone | not null default ('now'::text):: timestamp(6) with time zone
source_id | integer | not null default 1
Indexes: measurement_dur ation_idx btree (duration),
measurement_hos t_idx btree (host_id),
measurement_mea sure_idx btree (measure_id),
measurement_mom ent_idx btree (moment),
measurement_sou rce_idx btree (source_id),
measurement_val ue_idx btree (value)
Foreign Key constraints: $2 FOREIGN KEY (measure_id) REFERENCES measure(id) ON UPDATE NO ACTION ON DELETE NO ACTION,
$1 FOREIGN KEY (source_id) REFERENCES measurement_sou rce(id) ON UPDATE NO ACTION ON DELETE NO ACTION

% psql -c "analyze measurement"
ANALYZE

% psql -c "select count(1) from measurement"
count
---------
1800866
(1 row)
% psql -c "select count(1) from measurement where measure_id = 98"
count
-------
38862
(1 row)

% time psql -c "set enable_seqscan= no; explain analyze select count(1) from measurement where measure_id = 98"
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=215208.66 ..215208.66 rows=1 width=0) (actual time=904.58..90 4.58 rows=1 loops=1)
-> Index Scan using measurement_mea sure_idx on measurement (cost=0.00..215 062.64 rows=58408 width=0) (actual time=0.17..843. 78 rows=38866 loops=1)
Index Cond: (measure_id = 98)
Total runtime: 904.77 msec
(4 rows)
real 0m1.298s
user 0m0.010s
sys 0m0.000s

% time psql -c "explain analyze select count(1) from measurement where measure_id = 98"
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=97857.09. .97857.09 rows=1 width=0) (actual time=2493.90..2 493.90 rows=1 loops=1)
-> Seq Scan on measurement (cost=0.00..977 11.07 rows=58408 width=0) (actual time=12.94..243 0.08 rows=38866 loops=1)
Filter: (measure_id = 98)
Total runtime: 2494.11 msec
(4 rows)
real 0m2.885s
user 0m0.000s
sys 0m0.000s
This seems to be saying the planner thinks its less expensive to do the
sequential scan, but why?

Including pg_stats data in case it is relevant here.

% psql -c "select * from pg_stats where tablename = 'measurement' and attname = 'measure_id'"
schemaname | tablename | attname | null_frac | avg_width | n_distinct | most_common_val s | most_common_fre qs | histogram_bound s | correlation
------------+-------------+------------+-----------+-----------+------------+-------------------------------------------+----------------------------------------------------------------------------------+-----------------------------------------+-------------
public | measurement | measure_id | 0 | 4 | 52 | {104,108,113,12 4,106,110,101,1 07,112,109} | {0.084,0.055666 7,0.052,0.05,0. 048,0.0473333,0 .0383333,0.0363 333,0.034,0.032 6667} | {23,36,39,43,85 ,89,100,111,120 ,122,128} | 0.232087
(1 row)

% psql -c "select name, setting from pg_settings where name like 'random%'"
name | setting
------------------+---------
random_page_cos t | 4
(1 row)


I'd say your random_page_cos t is too low for your setup. While there's
a slight misguess on the planner in the number of rows (38k verus 58k)
it's not that big. Try values between 1.2 and 2.0. Most larger servers
with plenty of memory work well around 1.3 to 1.5. Also, make sure your
effective_cache _size is properly set.
---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Nov 23 '05 #2
"Ed L." <pg***@bluepolk a.net> writes:
This seems to be saying the planner thinks its less expensive to do the
sequential scan, but why?


Because it thinks it's less expensive ;-)

There are a couple issues here: one is overestimation of the number of
rows matching the query. That is a statistical issue and should be
fixable by increasing the statistics target for the column. With an
accurate rows estimate the seqscan cost estimate would not change but
the indexscan cost estimate would decrease approximately proportionally.

Allowing for the row estimation error, the indexscan cost estimate is
still about 4x what it ought to be, which means that you could bring the
estimated cost (for this query anyway) in line with reality by setting
random_page_cos t = 1.

Note however that we are defining "reality" as "the results of this one
single experiment". You should realize in particular that the table is
probably fully cached in memory during your test, which is a scenario
in which random_page_cos t actually is 1.0 (given reasonable assumptions
about the behavior of the kernel's cache, anyway). If you optimize for
this case then you are likely to be pessimizing the behavior for larger
tables that don't fit in memory.

My suggestion would be to lower random_page_cos t to 3 or so, which would
be enough to tip the decision to indexscan for this case, though not to
make the estimated cost really truly correct. You should however run a
larger set of experiments before doing anything, and realize that any
random_page_cos t setting is a compromise because the model doesn't take
all the variables into account.

You can find much more about this issue in the pgsql-performance archives.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to ma*******@postg resql.org

Nov 23 '05 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
1513
by: Michele | last post by:
Hi all, There's a web-based project I'm working on, which my boss insists should be implemented as a java web service: I know Java, but I'm a complete newcomer to web services, so I would badly need your enlightened opinion to figure out if my boss is right. I'm going to have two sorts of clients connecting to my (Apache) web server: a "planner" user, making schedules of activities to be performed and recording them on a MySQL database,...
3
2906
by: Maryam | last post by:
Hi, I am having problems getting anything useful out of the index tuning wizard. I have created a table and inserted data into it. When i run the index tuning wizard i expect 2 indexes to be recommended so the book says (Index011 with a key on the uniqueid column and a non clustered index named table02 with a key on the col03 and LongCol02) Instead i get nothing being recommended.
2
1278
by: Chris Travers | last post by:
Hi all; I suspect I know what the answer here is, but I thought I would ask anyway. How transparent are SQL language functions from the planner's perspective? For example if I create a table: CREATE TABLE proof_of_concept( poc_id serial primary key, testing text );
10
4658
by: Alex Greem | last post by:
Dear all, Our database (DB2 Workgroup 7.2 FP12) is constantly under heavy load. Most time CPU usage (1 Pentium3 1Ghz) is more 50% busy. We have 3GB RAM memory Our normal workload is 200-300 dyn.sql/s. The action question is there way to eliminate direct I/O? By definition direct I/O is I/O that bypass bufferpools and used for LONG VARCHAR and LOBs data.
1
1456
by: Martin Marques | last post by:
I was reading the section about Planner Method Configuration, and I just don't get why all the options have this message: "This is used for debugging the query planner". Are they all debugging options, because they don't look like. http://www.postgresql.org/docs/current/interactive/ -> runtime-config.html#RUNTIME-CONFIG-QUERY -- 15:45:02 up 64 days, 22:01, 2 users, load average: 1.49, 1.32, 0.85...
3
2027
by: mleal | last post by:
Does anyone have some more detailed information about how Oracle and MS implement / allow Tuning on Oracle 10g and SQL Server 2005 and the differences between them? Which of them, In a deep comparison about it, allow better tuning and why. Regards, Marcio Evangelista
13
4597
by: atlaste | last post by:
Hi, I'm currently developing an application that uses a lot of computational power, disk access and memory caching (to be more exact: an information retrieval platform). In these kind of applications the last thing that remains is bare performance tuning. So for example, you can do an 'if then else' on a bit like a 'case/ switch', an 'if/then/else' and as a multiplication with a static buffer. Or, you can do sorting with an inline...
3
7922
by: dunleav1 | last post by:
In 9.1 and 9.5 (Linux 64 bit) when a buffer pool is set to self- tuning, how are blocks configured in respect to blocked vs non-blocked when self-tuning is set to on? (ie) I have one bufferpool that is 16k that the IBMDEFAULTBP is shared between all tablespaces. The data tablespace has prefetch automatic set to on.
2
3115
by: satish mullapudi | last post by:
Hi All, We are using DB2 v8.2 and are planning to migrate to DB2 v9.5. The settings of our servers are as follows: OS: WIN (4 servers) , AIX(1 server) RAM: 3GB HDD: 40GB Other huge s/ws: WAS 6.1, IHS 6.1, other small 3rd party tools ( these doesn't take much of the RAM.
0
8652
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
9234
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
9089
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
8983
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
7831
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
4412
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
4667
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
3107
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
3
2036
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.