Optimizer choosing smaller index instead of right one

Ian Ribas

Hello,

This is probably a common problem, but I couldn't really find a direct answer in
the archives (or maybe just couldn't find one that satisfied me ;-). I created
an index specifically to help a query and the optimizer does not use it. It
prefers an older index that has one less column, but that yields much poorer
performance.

The query get the smallest date in a period, for some criteria. The one table
used in the query has 604494 rows. The index I created to try to optimize it is
"xie3cham_servico", and has all the three columns of the table that are used in
the query. But the index being used is actually xie2cham_servico (as can be seen
below) which has only two columns, one of them completely useless for the query!
I'm probably missing some setting, but I couldn't figure out which.

One last comment, if I test this on a server all by myself, after some test
(when, I imagine, all related disk pages are on cache) the time difference
between the two situations is much smaller (4secs against 2 secs), but, of
course, this doesn't help me much ...

Here is the associated information:

Table "public.cham_servico"
Column | Type | Modifiers
------------------+--------------------------------+-----------
dt_inicial | timestamp(0) without time zone | not null
cod_bxs | integer | not null
chave | integer | not null
identificacao | integer | not null
cod_cifra | character varying(25) | not null
dt_identificacao | timestamp(0) without time zone | not null
dt_ocupacao | timestamp(0) without time zone |
temp_troca | integer |
temp_ura | integer |
temp_filapa | integer |
temp_atendpa | integer |
temp_total | integer |
dt_finalizacao | timestamp(0) without time zone |
pa | integer |
flg_tipodiscagem | character(1) |
cod_liberservico | integer |
dt_login | timestamp(0) without time zone |
id_chave_grupo | character varying(20) |
Indexes: xpkcham_servico primary key btree (dt_inicial, cod_bxs, chave,
identificacao, cod_cifra, dt_identificacao),
xie1cham_servico btree (dt_finalizacao, cod_cifra),
xie2cham_servico btree (id_chave_grupo, dt_inicial),
xie3cham_servico btree (id_chave_grupo, cod_bxs, dt_finalizacao)
Foreign Key constraints: rchamadaxservico FOREIGN KEY (dt_inicial, cod_bxs,
chave, identificacao) REFERENCES cham_chamada(dt_inicial, cod_bxs, chave,
identificacao) ON UPDATE NO ACTION ON DELETE NO ACTION

explain analyze SELECT dt_finalizacao FROM cham_servico WHERE id_chave_grupo =
'7458' AND cod_bxs = 1 AND dt_finalizacao > to_timestamp
('2004-04-20','YYYY-MM-DD') AND dt_finalizacao < now() ORDER BY dt_finalizacao
ASC LIMIT 1;

QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=15663.77..15663.77 rows=1 width=8) (actual time=14002.66..14002.66
rows=1 loops=1)
-> Sort (cost=15663.77..15663.83 rows=26 width=8) (actual
time=14002.65..14002.65 rows=2 loops=1)
Sort Key: dt_finalizacao
-> Index Scan using xie2cham_servico on cham_servico
(cost=0.00..15663.15 rows=26 width=8) (actual time=13604.37..13980.16 rows=3173
loops=1)
Index Cond: (id_chave_grupo = '7458'::character varying)
Filter: ((cod_bxs = 1) AND ((dt_finalizacao)::timestamp with time
zone > to_timestamp('2004-04-20'::text, 'YYYY-MM-DD'::text)) AND
((dt_finalizacao)::timestamp with time zone < now()))
Total runtime: 45014.06 msec
(7 rows)

Time: 45105,73 ms

If I "drop index xie2cham_servico;", I get :

explain analyze SELECT dt_finalizacao FROM cham_servico WHERE id_chave_grupo =
'7458' AND cod_bxs = 1 AND dt_finalizacao > to_timestamp
('2004-04-20','YYYY-MM-DD') AND dt_finalizacao < now() ORDER BY dt_finalizacao
ASC LIMIT 1;

QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=17600.92..17600.92 rows=1 width=8) (actual time=8750.64..8750.64
rows=1 loops=1)
-> Sort (cost=17600.92..17600.99 rows=26 width=8) (actual
time=8750.63..8750.63 rows=2 loops=1)
Sort Key: dt_finalizacao
-> Index Scan using xie3cham_servico on cham_servico
(cost=0.00..17600.30 rows=26 width=8) (actual time=8432.37..8743.29 rows=3176
loops=1)
Index Cond: ((id_chave_grupo = '7458'::character varying) AND
(cod_bxs = 1))
Filter: (((dt_finalizacao)::timestamp with time zone >
to_timestamp('2004-04-20'::text, 'YYYY-MM-DD'::text)) AND
((dt_finalizacao)::timestamp with time zone < now()))
Total runtime: 8751.50 msec
(7 rows)

Time: 8758,36 ms

Some more environment info:

Postgres 7.3.4 on Linux Redhat 9.

show all:

name | setting
--------------------------------+---------------------------------------
australian_timezones | off
authentication_timeout | 60
autocommit | on
checkpoint_segments | 3
checkpoint_timeout | 300
client_encoding | SQL_ASCII
client_min_messages | notice
commit_delay | 0
commit_siblings | 5
cpu_index_tuple_cost | 0.001
cpu_operator_cost | 0.0025
cpu_tuple_cost | 0.01
DateStyle | ISO with US (NonEuropean) conventions
db_user_namespace | off
deadlock_timeout | 1000
debug_pretty_print | off
debug_print_parse | off
debug_print_plan | off
debug_print_rewritten | off
default_statistics_target | 10
default_transaction_isolation | read committed
dynamic_library_path | $libdir
effective_cache_size | 1000
enable_hashjoin | on
enable_indexscan | on
enable_mergejoin | on
enable_nestloop | on
enable_seqscan | off
enable_sort | on
enable_tidscan | on
explain_pretty_print | on
fixbtree | on
fsync | on
geqo | on
geqo_effort | 1
geqo_generations | 0
geqo_pool_size | 0
geqo_random_seed | -1
geqo_selection_bias | 2
geqo_threshold | 11
hostname_lookup | off
krb_server_keyfile | FILE:/etc/sysconfig/pgsql/krb5.keytab
lc_messages | pt_BR
lc_monetary | pt_BR
lc_numeric | pt_BR
lc_time | pt_BR
log_connections | off
log_duration | off
log_min_error_statement | panic
log_pid | off
log_statement | off
log_timestamp | off
max_connections | 32
max_expr_depth | 10000
max_files_per_process | 1000
max_fsm_pages | 10000
max_fsm_relations | 1000
max_locks_per_transaction | 64
password_encryption | on
port | 5432
pre_auth_delay | 0
random_page_cost | 4
search_path | $user,public
server_encoding | SQL_ASCII
server_min_messages | notice
shared_buffers | 64
show_executor_stats | off
show_parser_stats | off
show_planner_stats | off
show_source_port | off
show_statement_stats | off
silent_mode | off
sort_mem | 1024
sql_inheritance | on
ssl | off
statement_timeout | 0
stats_block_level | off
stats_command_string | off
stats_reset_on_server_start | on
stats_row_level | off
stats_start_collector | on
superuser_reserved_connections | 2
syslog | 0
syslog_facility | LOCAL0
syslog_ident | postgres
tcpip_socket | on
TimeZone | unknown
trace_notify | off
TRANSACTION ISOLATION LEVEL | READ COMMITTED
transform_null_equals | off
unix_socket_directory | unset
unix_socket_group | unset
unix_socket_permissions | 511
vacuum_mem | 8192
virtual_host | unset
wal_buffers | 8
wal_debug | 0
wal_sync_method | fdatasync
zero_damaged_pages | off

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to ma*******@postgresql.org

Nov 23 '05 #1

Subscribe Post Reply

1559

Ian Ribas

Please ignore my previous message, its a duplicate.Sorry for the
inconvenience!

Ian.
Ian Ribas wrote:

....

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Nov 23 '05 #2

Ian Ribas

Please ignore my previous message, its a duplicate.Sorry for the
inconvenience!

Ian.
Ian Ribas wrote:

....

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Nov 23 '05 #3

Tom Lane

Ian Ribas <ia*******@digitro.com.br> writes:

The index I created to try to optimize it is "xie3cham_servico", and
has all the three columns of the table that are used in the query. But
the index being used is actually xie2cham_servico (as can be seen
below) which has only two columns, one of them completely useless for
the query! I'm probably missing some setting, but I couldn't figure
out which.
The optimizer's cost equations do say that a smaller index should be
preferred over a larger one if the number of rows obtainable from the
index is comparable (which I trust you will agree is reasonable).
I think the real issue here is a datatype conflict. Look closely
at the filter conditions:
-> Index Scan using xie2cham_servico on cham_servico (cost=0.00..15663.15 rows=26 width=8) (actual time=13604.37..13980.16 rows=3173 loops=1)
Index Cond: (id_chave_grupo = '7458'::character varying)
Filter: ((cod_bxs = 1) AND ((dt_finalizacao)::timestamp with time zone > to_timestamp('2004-04-20'::text, 'YYYY-MM-DD'::text)) AND ((dt_finalizacao)::timestamp with time zone < now()))

The two comparisons on dt_finalizacao are unindexable because they
coerce it to a different datatype; the "<" and ">" are the operators on
timestamp with time zone and so they are not relevant to an index on
timestamp without time zone.

Very likely your column really ought to be of type timestamp with time
zone. If you are convinced it should be timestamp without time zone
then you need to fix the values you are comparing to.

I'm also a bit disturbed by the fact that the rows estimates are off by
more than two orders of magnitude (26 est vs 3173 actual). It would
perhaps help if you wrote the boundary date as a simple constant:

... AND dt_finalizacao > '2004-04-20' AND ...

so that the planner had some chance of comparing the constant to its
statistics about the distribution of dt_finalizacao. to_timestamp is
not a constant-foldable function (because it depends on the current
TimeZone setting) and so the query as you've written it looks like

... AND dt_finalizacao > unpredictable_expression AND ...

as far as the planner knows. (I'm not sure this will help though;
it may be that the big problem is that the three columns are not
independent, which is something the planner doesn't know because it has
no cross-column statistics.)

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Nov 23 '05 #4

Tom Lane

Ian Ribas <ia*******@digitro.com.br> writes:

The index I created to try to optimize it is "xie3cham_servico", and
has all the three columns of the table that are used in the query. But
the index being used is actually xie2cham_servico (as can be seen
below) which has only two columns, one of them completely useless for
the query! I'm probably missing some setting, but I couldn't figure
out which.
The optimizer's cost equations do say that a smaller index should be
preferred over a larger one if the number of rows obtainable from the
index is comparable (which I trust you will agree is reasonable).
I think the real issue here is a datatype conflict. Look closely
at the filter conditions:
-> Index Scan using xie2cham_servico on cham_servico (cost=0.00..15663.15 rows=26 width=8) (actual time=13604.37..13980.16 rows=3173 loops=1)
Index Cond: (id_chave_grupo = '7458'::character varying)
Filter: ((cod_bxs = 1) AND ((dt_finalizacao)::timestamp with time zone > to_timestamp('2004-04-20'::text, 'YYYY-MM-DD'::text)) AND ((dt_finalizacao)::timestamp with time zone < now()))

Nov 23 '05 #5

Ian Ribas

Effectively, I hadn't seen the cast being done on the field
dt_finalizacao! The query actually runs inside a pgplsql procedure, so
the parameters are unknown constants (variables), but the situation got
much much better when I forced a cast on the constant part of the
condition (as follows), so allowing the planner to correctly choose an
index :

explain analyze SELECT dt_finalizacao
FROM cham_servico
WHERE id_chave_grupo = '7458' AND cod_bxs = 1
AND dt_finalizacao > CAST(to_timestamp ('2004-04-20','YYYY-MM-DD') AS
TIMESTAMP(0) WITHOUT TIME ZONE)
AND dt_finalizacao < CAST(now() AS TIMESTAMP(0) WITHOUT TIME ZONE)
ORDER BY dt_finalizacao ASC LIMIT 1;

QUERY
PLAN

------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..364.98 rows=1 width=8) (actual time=52.73..52.90
rows=1 loops=1)
-> Index Scan using xie1cham_servico on cham_servico
(cost=0.00..9756.63 rows=27 width=8) (actual time=52.73..52.89 rows=2
loops=1)
Index Cond: ((dt_finalizacao >
(to_timestamp('2004-04-20'::text, 'YYYY-MM-DD'::text))::timestamp(0)
without time zone) AND (dt_finalizacao < (now())::timestamp(0) without
time zone))
Filter: ((id_chave_grupo = '7458'::character varying) AND
(cod_bxs = 1))
Total runtime: 53.11 msec
(5 rows)

Unfortunately, I really must use timestamp without time zone for some
other unrelated reasons ...

Thank you very much for your attention

Best wishes,
Ian.

Tom Lane wrote:

Ian Ribas <ia*******@digitro.com.br> writes:

The index I created to try to optimize it is "xie3cham_servico", and
has all the three columns of the table that are used in the query. But
the index being used is actually xie2cham_servico (as can be seen
below) which has only two columns, one of them completely useless for
the query! I'm probably missing some setting, but I couldn't figure
out which.

The optimizer's cost equations do say that a smaller index should be
preferred over a larger one if the number of rows obtainable from the
index is comparable (which I trust you will agree is reasonable).
I think the real issue here is a datatype conflict. Look closely
at the filter conditions:

-> Index Scan using xie2cham_servico on cham_servico (cost=0.00..15663.15 rows=26 width=8) (actual time=13604.37..13980.16 rows=3173 loops=1)
Index Cond: (id_chave_grupo = '7458'::character varying)
Filter: ((cod_bxs = 1) AND ((dt_finalizacao)::timestamp with time zone > to_timestamp('2004-04-20'::text, 'YYYY-MM-DD'::text)) AND ((dt_finalizacao)::timestamp with time zone < now()))

The two comparisons on dt_finalizacao are unindexable because they
coerce it to a different datatype; the "<" and ">" are the operators on
timestamp with time zone and so they are not relevant to an index on
timestamp without time zone.

Very likely your column really ought to be of type timestamp with time
zone. If you are convinced it should be timestamp without time zone
then you need to fix the values you are comparing to.

I'm also a bit disturbed by the fact that the rows estimates are off by
more than two orders of magnitude (26 est vs 3173 actual). It would
perhaps help if you wrote the boundary date as a simple constant:

... AND dt_finalizacao > '2004-04-20' AND ...

so that the planner had some chance of comparing the constant to its
statistics about the distribution of dt_finalizacao. to_timestamp is
not a constant-foldable function (because it depends on the current
TimeZone setting) and so the query as you've written it looks like

... AND dt_finalizacao > unpredictable_expression AND ...

as far as the planner knows. (I'm not sure this will help though;
it may be that the big problem is that the three columns are not
independent, which is something the planner doesn't know because it has
no cross-column statistics.)

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to ma*******@postgresql.org

Nov 23 '05 #6

Ian Ribas

Ian Ribas <ia*******@digitro.com.br> writes:

The index I created to try to optimize it is "xie3cham_servico", and
has all the three columns of the table that are used in the query. But
the index being used is actually xie2cham_servico (as can be seen
below) which has only two columns, one of them completely useless for
the query! I'm probably missing some setting, but I couldn't figure
out which.

The optimizer's cost equations do say that a smaller index should be
preferred over a larger one if the number of rows obtainable from the
index is comparable (which I trust you will agree is reasonable).
I think the real issue here is a datatype conflict. Look closely
at the filter conditions:

-> Index Scan using xie2cham_servico on cham_servico (cost=0.00..15663.15 rows=26 width=8) (actual time=13604.37..13980.16 rows=3173 loops=1)
Index Cond: (id_chave_grupo = '7458'::character varying)
Filter: ((cod_bxs = 1) AND ((dt_finalizacao)::timestamp with time zone > to_timestamp('2004-04-20'::text, 'YYYY-MM-DD'::text)) AND ((dt_finalizacao)::timestamp with time zone < now()))

The two comparisons on dt_finalizacao are unindexable because they
coerce it to a different datatype; the "<" and ">" are the operators on
timestamp with time zone and so they are not relevant to an index on
timestamp without time zone.

Very likely your column really ought to be of type timestamp with time
zone. If you are convinced it should be timestamp without time zone
then you need to fix the values you are comparing to.

I'm also a bit disturbed by the fact that the rows estimates are off by
more than two orders of magnitude (26 est vs 3173 actual). It would
perhaps help if you wrote the boundary date as a simple constant:

... AND dt_finalizacao > '2004-04-20' AND ...

so that the planner had some chance of comparing the constant to its
statistics about the distribution of dt_finalizacao. to_timestamp is
not a constant-foldable function (because it depends on the current
TimeZone setting) and so the query as you've written it looks like

... AND dt_finalizacao > unpredictable_expression AND ...

as far as the planner knows. (I'm not sure this will help though;
it may be that the big problem is that the three columns are not
independent, which is something the planner doesn't know because it has
no cross-column statistics.)

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to ma*******@postgresql.org

Nov 23 '05 #7

Similar topics

strange optimizer

by: Carlo Paccanoni | last post by:

I tried this: use northwind go SELECT OrderDate FROM Orders WHERE OrderDate > '19950101' see the query plan? ok

Microsoft SQL Server

Weird Optimizer Choice

by: Philip Yale | last post by:

I'm very puzzled by the choice of NC index being made by the optimizer in this example. I don't actually think it should use an NC index at all. I have: Table: CustomerStatus_T Single data...

Microsoft SQL Server

Optimizer isn't perfect

by: Greg Stark | last post by:

Hm, here's a query where the optimizer is choosing the wrong plan by far. I think it boils down to it guessing wrong on how selective an rtree index is, which I guess would be hard to predict. ...

PostgreSQL Database

Optimizer bad with sparse keys?

by: Peter Arrenbrecht | last post by:

Hi all We ran into a very annoying optimizer problem recently. We had added a new key and self-join relation to a large production table. That key will be filled very rarely and having just...

DB2 Database

Optimizer choosing bad plan?

by: Mike L. Bell | last post by:

Pulling our hair out over this one. We have two databases. One is used for OLTP. The other hosts old closed transactions and is mainly used for reporting. Every month we identify rows in the...

DB2 Database

Optimizer choosing smaller index instead of right one

by: Ian Ribas | last post by:

Hello, This is probably a common problem, but I couldn't really find a direct answer in the archives (or maybe just couldn't find one that satisfied me ;-). I created an index specifically to...

PostgreSQL Database

Performance issue: Need to force Optimizer use a specific Index

by: Venkatesh Subbaramu | last post by:

Hi Expert DBAs, I am facing an issue with a query, can anyone help me through? I have a Query: UPDATE VMUS.MSG_MSTR a SET MSG_STA_ID =5 WHERE a.PROC_ID =2 AND a.file_type_id =1 AND MSG_STA_ID...

DB2 Database

Optimizer questions

by: aj | last post by:

DB2 LUW v8.2 FP 14 RHAS 2.1 Sorry if these are newbie questions. Optimizer stuff is black magic to me. For both of these, assume stats are current and an even distribution of data....

DB2 Database

Distributed statistics and optimizer

by: P. Adhia | last post by:

Hi, Consider the following query select * from t1 where c1 = ? and c2 = ?

DB2 Database

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server