Unable to use index? - PostgreSQL Database

Edmund Dengler

Hi folks!

A query I am running does not seem to use indexes that are available
(running version 7.4.2). I have the following table:

=> \d replicated
Table "public.replica ted"
Column | Type |
Modifiers
-----------------+--------------------------+-----------------------------------------------------
rep_id | bigint | not null default nextval('replic ated_id_seq'::t ext)
rep_component | character varying(100) |
rep_key1 | integer |
rep_key2 | bigint |
rep_key3 | smallint |
rep_replicated | timestamp with time zone |
rep_remotekey1 | integer |
rep_remotekey2 | bigint |
rep_remotekey3 | smallint |
rep_key2b | bigint |
rep_remotekey2b | bigint |
rep_key4 | text |
Indexes:
"replicated_pke y" primary key, btree (rep_id)
"replicate_key1 _idx" btree (rep_key1, rep_key2, rep_key3)
"replicated_ite m2_idx" btree (rep_component, rep_key2, rep_key3)
"replicated_ite m_idx" btree (rep_component, rep_key1, rep_key2, rep_key3)
"replicated_key 2_idx" btree (rep_key2, rep_key3)
"replicated_key 4_idx" btree (rep_key4)

=> analyze verbose replicated;
INFO: analyzing "public.replica ted"
INFO: "replicated ": 362140 pages, 30000 rows sampled, 45953418 estimated
total rows
ANALYZE

The following does not use an index, even though two are available for the
specific selection of rep_component.

=> explain analyze select * from replicated where rep_component = 'ps_probe' limit 1;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..0.2 3 rows=1 width=101) (actual time=34401.857. .34401.859 rows=1 loops=1)
-> Seq Scan on replicated (cost=0.00..936 557.70 rows=4114363 width=101) (actual time=34401.849. .34401.849 rows=1 loops=1)
Filter: ((rep_component )::text = 'ps_probe'::tex t)
Total runtime: 34401.925 ms
(4 rows)

Yet, if I do the following, an index will be used, and it runs much
faster (even when I swapped the order of the execution).

=> explain analyze select * from replicated where rep_component = 'ps_probe' order by rep_component limit 1;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..1.6 6 rows=1 width=101) (actual time=51.163..51 .165 rows=1 loops=1)
-> Index Scan using replicated_item 2_idx on replicated (cost=0.00..683 8123.76 rows=4114363 width=101) (actual time=51.157..51 .157 rows=1 loops=1)
Index Cond: ((rep_component )::text = 'ps_probe'::tex t)
Total runtime: 51.265 ms
(4 rows)

Any reason why the index is not chosen? Maybe I need to up the number of
rows sampled for statistics?

Regards!
Ed

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddres sHere" to ma*******@postg resql.org)

Nov 23 '05 #1

Subscribe Reply

1533

Manfred Koizar

On Thu, 29 Apr 2004 09:48:10 -0400 (EDT), Edmund Dengler
<ed*****@eSenti re.com> wrote:

=> explain analyze select * from replicated where rep_component = 'ps_probe' limit 1;
-------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..0.2 3 rows=1 width=101) (actual time=34401.857. .34401.859 rows=1 loops=1)
-> Seq Scan on replicated (cost=0.00..936 557.70 rows=4114363 width=101) (actual time=34401.849. .34401.849 rows=1 loops=1) ^^^^ Filter: ((rep_component )::text = 'ps_probe'::tex t)
The planner thinks that the seq scan has a startup cost of 0.00, i.e.
that it can return the first tuple immediately, which is obviously not
true in the presence of a filter condition. Unfortunately there's no
easy way to fix this, because the statistics information does not have
information about the physical position of tuples with certain vaules.
=> explain analyze select * from replicated where rep_component = 'ps_probe' order by rep_component limit 1;
This is a good workaround. It makes the plan for a seq scan look like

| Limit (cost=2345679.0 0..2345679.20 rows=1 width=101)
| -> Sort (2345678.90..25 00000.00 rows=4114363 width=101)
| -> Seq Scan on replicated (cost=0.00..936 557.70 rows=4114363 width=101)
| Filter: ((rep_component )::text = 'ps_probe'::tex t)

which is a loser against the index scan:
Limit (cost=0.00..1.6 6 rows=1 width=101) (actual time=51.163..51 .165 rows=1 loops=1) Maybe I need to up the number of rows sampled for statistics?

Won't help, IMHO.

Servus
Manfred

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Nov 23 '05 #2

Manfred Koizar

On Thu, 29 Apr 2004 09:48:10 -0400 (EDT), Edmund Dengler
<ed*****@eSenti re.com> wrote:

=> explain analyze select * from replicated where rep_component = 'ps_probe' limit 1;
-------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..0.2 3 rows=1 width=101) (actual time=34401.857. .34401.859 rows=1 loops=1)
-> Seq Scan on replicated (cost=0.00..936 557.70 rows=4114363 width=101) (actual time=34401.849. .34401.849 rows=1 loops=1) ^^^^ Filter: ((rep_component )::text = 'ps_probe'::tex t)
The planner thinks that the seq scan has a startup cost of 0.00, i.e.
that it can return the first tuple immediately, which is obviously not
true in the presence of a filter condition. Unfortunately there's no
easy way to fix this, because the statistics information does not have
information about the physical position of tuples with certain vaules.
=> explain analyze select * from replicated where rep_component = 'ps_probe' order by rep_component limit 1;
This is a good workaround. It makes the plan for a seq scan look like

| Limit (cost=2345679.0 0..2345679.20 rows=1 width=101)
| -> Sort (2345678.90..25 00000.00 rows=4114363 width=101)
| -> Seq Scan on replicated (cost=0.00..936 557.70 rows=4114363 width=101)
| Filter: ((rep_component )::text = 'ps_probe'::tex t)

which is a loser against the index scan:
Limit (cost=0.00..1.6 6 rows=1 width=101) (actual time=51.163..51 .165 rows=1 loops=1) Maybe I need to up the number of rows sampled for statistics?

Won't help, IMHO.

Servus
Manfred

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Nov 23 '05 #3

Edmund Dengler

Hmm, interesting as I have that table clustered starting with the
rep_component, so 'ps_probe' will definitely appear later in a sequential
scan. So why does the <order by> force the use of the index?

Regards!
Ed

On Thu, 29 Apr 2004, Tom Lane wrote:

Manfred Koizar <mk*****@aon.at > writes:
The planner thinks that the seq scan has a startup cost of 0.00, i.e.
that it can return the first tuple immediately, which is obviously not
true in the presence of a filter condition.

Not really --- the startup cost is really defined as "cost expended
before we can start scanning for results". The estimated cost to select
N tuples is actually "startup_co st + N*(total_cost-startup_cost)/M",
where M is the estimated total rows returned. This is why the LIMIT
shows a nonzero estimate for the cost to fetch 1 row.
Unfortunately there's no
easy way to fix this, because the statistics information does not have
information about the physical position of tuples with certain vaules.

Yeah, I think the real problem is that the desired rows are not
uniformly distributed, and in fact there are none near the start of the
table. We do not keep stats detailed enough to let the planner discover
this, so it has to estimate on the assumption of uniform distribution.
On that assumption, it looks like a seqscan will hit a suitable tuple
quickly enough to be faster than using the index.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ma*******@postg resql.org so that your
message can get through to the mailing list cleanly

Nov 23 '05 #4

Edmund Dengler

Manfred Koizar <mk*****@aon.at > writes:
The planner thinks that the seq scan has a startup cost of 0.00, i.e.
that it can return the first tuple immediately, which is obviously not
true in the presence of a filter condition.

Not really --- the startup cost is really defined as "cost expended
before we can start scanning for results". The estimated cost to select
N tuples is actually "startup_co st + N*(total_cost-startup_cost)/M",
where M is the estimated total rows returned. This is why the LIMIT
shows a nonzero estimate for the cost to fetch 1 row.
Unfortunately there's no
easy way to fix this, because the statistics information does not have
information about the physical position of tuples with certain vaules.

Yeah, I think the real problem is that the desired rows are not
uniformly distributed, and in fact there are none near the start of the
table. We do not keep stats detailed enough to let the planner discover
this, so it has to estimate on the assumption of uniform distribution.
On that assumption, it looks like a seqscan will hit a suitable tuple
quickly enough to be faster than using the index.

regards, tom lane

Nov 23 '05 #5

Tom Lane

Manfred Koizar <mk*****@aon.at > writes:

The planner thinks that the seq scan has a startup cost of 0.00, i.e.
that it can return the first tuple immediately, which is obviously not
true in the presence of a filter condition.
Not really --- the startup cost is really defined as "cost expended
before we can start scanning for results". The estimated cost to select
N tuples is actually "startup_co st + N*(total_cost-startup_cost)/M",
where M is the estimated total rows returned. This is why the LIMIT
shows a nonzero estimate for the cost to fetch 1 row.
Unfortunately there's no
easy way to fix this, because the statistics information does not have
information about the physical position of tuples with certain vaules.

Yeah, I think the real problem is that the desired rows are not
uniformly distributed, and in fact there are none near the start of the
table. We do not keep stats detailed enough to let the planner discover
this, so it has to estimate on the assumption of uniform distribution.
On that assumption, it looks like a seqscan will hit a suitable tuple
quickly enough to be faster than using the index.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Nov 23 '05 #6

Tom Lane

Manfred Koizar <mk*****@aon.at > writes:

The planner thinks that the seq scan has a startup cost of 0.00, i.e.
that it can return the first tuple immediately, which is obviously not
true in the presence of a filter condition.
Not really --- the startup cost is really defined as "cost expended
before we can start scanning for results". The estimated cost to select
N tuples is actually "startup_co st + N*(total_cost-startup_cost)/M",
where M is the estimated total rows returned. This is why the LIMIT
shows a nonzero estimate for the cost to fetch 1 row.
Unfortunately there's no
easy way to fix this, because the statistics information does not have
information about the physical position of tuples with certain vaules.

Nov 23 '05 #7

Tom Lane

Edmund Dengler <ed*****@eSenti re.com> writes:

Hmm, interesting as I have that table clustered starting with the
rep_component, so 'ps_probe' will definitely appear later in a sequential
scan. So why does the <order by> force the use of the index?

It does not "force" anything, it simply alters the cost estimates. The
seqscan-based plan requires an extra sort step to meet the ORDER BY,
while the indexscan plan does not. In this particular scenario the
indexscan plan is estimated to beat seqscan+sort, but in other cases the
opposite decision might be made.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Nov 23 '05 #8

Tom Lane

Edmund Dengler <ed*****@eSenti re.com> writes:

Hmm, interesting as I have that table clustered starting with the
rep_component, so 'ps_probe' will definitely appear later in a sequential
scan. So why does the <order by> force the use of the index?

Nov 23 '05 #9

Greg Stark

Tom Lane <tg*@sss.pgh.pa .us> writes:

Unfortunately there's no easy way to fix this, because the statistics
information does not have information about the physical position of
tuples with certain vaules.

Yeah, I think the real problem is that the desired rows are not
uniformly distributed, and in fact there are none near the start of the
table. We do not keep stats detailed enough to let the planner discover
this, so it has to estimate on the assumption of uniform distribution.
On that assumption, it looks like a seqscan will hit a suitable tuple
quickly enough to be faster than using the index.

It seems like this is another scenario where it would be helpful to have the
optimizer keep track of not just the average expected cost but also the
worst-case cost. Since the index scan in this case might have a higher
expected cost but a lower worst-case cost than the sequential scan.

For some applications the best bet may in fact be to go with the plan expected
to be fastest. But for others it would be more important to go with the plan
that is least likely to perform badly, even if it means paying a performance
penalty to avoid the risk.

--
greg
---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Nov 23 '05 #10

Similar topics

20627

Unable to remove any assembly from GAC

by: Peter Gomis | last post by:

I have encountered a situation where I am unable to remove a .NET assembly from the GAC. The message I receive is "Assembly 'assemblyname' could not be uninstalled because it is required by other applications." Although I have seen this before when trying to remove .NET assemblies that have been installed via an MSI package, I now get this...

.NET Framework

4796

unable to dedug asp.net app - The project is not configured to be debugged.

by: TM | last post by:

When I run an ASP.Net application I am getting the following error: "Error while trying to run project: Unable to start debugging on the web server. The project is not configured to be debugged." I checked and made sure the Machine Debug Manager is running on the server, made sure the web.comfit file has debug=true. Mine is: <compilation...

ASP.NET

4182

Unable to start debugging

by: Serdar Kalaycý | last post by:

Hi everybody, My problem seems a bit clichè but I could not work around. Well I read lots of MSDN papers and discussions, but my problem is a bit different from them. When I tried to run the project in debug mode (by hitting F5) it gives an error message "Error while trying to run project: Unable to start debugging on the web server.

ASP.NET

3523

Unable to find an entry point named EnumerateSecurityPackagesW in

by: TRI_CODER | last post by:

I am trying to solve the following exception. The exception occurs when my ASP.NET code behind code attemtps to access a remore site using SSL. Please note that all certificates are valid and the remote site is trusted. Also, my web site uses a custom HTTPModule implemented in a DLL named Security.dll. Unable to find an entry point named...

ASP.NET

613

Unable to open Web project

by: Jed | last post by:

I am trying to open web project in VS 2003 using the File Share method. VS is running on XP Pro (Host) and I am accessing the root web of an XP Pro install on Virtual PC (Server) running on the same machine. I have the loopback adapter running on the Host. The Host is sharing it's Internet connectivity with the loopback adapter and the...

ASP.NET

1943

Unable to get a Profile custom (object) collection to bind to Gri.

by: a | last post by:

Q. Unable to get a Profile custom (object) collection to bind to GridView,etc (IList objects)? This is my first custom object so I may be doing something rather simple, wrong, or it may be something else to do with the Profile object. Either way, I need help Here's a brief description of the...

C# / C Sharp

1852

Unable to create index 'PK_Contact_Table

by: sujith | last post by:

hi, if i use to set the primary key of table some error arises , like this one. Contact_Table' table - Unable to create index 'PK_Contact_Table'. ODBC error: CREATE UNIQUE INDEX terminated because a duplicate key was found for index ID 1. Most significant primary key is 'type 24, len 16'. Could not create constraint. See previous errors....

Microsoft SQL Server

14950

Unable to translate Unicode character \u00E9 at index 5409 to specified code page.

by: Les Caudle | last post by:

I've got some C# 2.0 code that has been working for a year. using (XmlWriter w = XmlWriter.Create("out.xml" ,settings)) { // many lines of code to write to w w.WriteStartElement("contactTypeRef"); Suddently, I'm getting this 100% repeatable error: ************** Exception Text ************** System.Text.EncoderFallbackException: Unable...

.NET Framework

2405

Unable to UPDATE record in database

by: Markw | last post by:

Hi folks I think I've got a variable problem but not 100% sure. Background: I took the CMS example from chapter 6 in "Build your Own Database Driven Website Using PHP&MySQL" and have attempted to modify it for use in my own database. It almost works for me LOL. contact.php returns my dive buddies first and last name and gives me the option...

PHP

5069

MESSAGE : Unable to find donor to satisfy minSize constraint

by: nimjerry | last post by:

i am using db2 udb V 9 on aix 5.3 and in db2diag.log alwas has this error occurr below is sample message 2008-03-03-09.45.34.366406+420 I306667A443 LEVEL: Warning PID : 835622 TID : 1 PROC : db2stmm (WEBEDIDB) INSTANCE: db2inst1 NODE : 000 DB : WEBEDIDB APPHDL : 0-8 ...

DB2 Database

7546

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main...

General

7471

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language...

Windows Server

7740

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. ...

C / C++

7503

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For...

Windows Server

7830

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the...

General

5387

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes...

Microsoft Access / VBA

5111

Couldn’t get equations in html when convert word .docx file to html file in C#.

by: conductexam | last post by:

I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert...

C# / C Sharp

3496

Windows Forms - .Net 8.0

by: adsilva | last post by:

A Windows Forms form does not have the event Unload, like VB6. What one acts like?

Visual Basic .NET

784

Comprehensive Guide to Website Development in Toronto: Expert Insights from BSMN Consultancy

by: bsmnconsultancy | last post by:

In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating...

General