Indexes? - PostgreSQL Database

Mike Leahy

Hello all,

I have a question related to the use of indexes. One of my tables is part of
a census, with over 2.5 million records in it. Clearly, just about any query
takes a rather long time to complete. I was hoping to reduce the amount of
time by using indexes. However, no matter what I do, PostgreSQL never seems
to use them. It would seem to make sense that if I calculate something
grouped by a set of variables that have an index created for them, that the
index should be used rather than a sequential search, but it doesn't. I have
a table with an index such as the one created below:

CREATE INDEX tbl_censo_poblacion_1993_manzana_idx ON tbl_censo_poblacion_1993
( dubicacion, zona, manzana );

Then I try a query such as:

select count(*) as POBLACION, (dubicacion || zona || manzana) as COD_MANZANA
from tbl_censo_poblacion_1993 group by dubicacion, zona, manzana;

The results from explain indicate that a sequential scan is used (as far as I
can tell). I tried adding where statement:

select count(*) as POBLACION, (dubicacion || zona || manzana) as COD_MANZANA
from tbl_censo_poblacion_1993 where dubicacion <> '' and zona <> '' and
manzana <> '' group by dubicacion, zona, manzana;

The EXPLAIN analysis results appear as follows:

Aggregate (cost=847587.90..879024.28 rows=251491 width=27) (actual
time=272782.00..279458.00 rows=21459 loops=1)
-> Group (cost=847587.90..872737.01 rows=2514911 width=27) (actual
time=272782.00..278546.00 rows=2553015 loops=1)
-> Sort (cost=847587.90..853875.18 rows=2514911 width=27) (actual
time=272782.00..274533.00 rows=2553015 loop
s=1)
Sort Key: dubicacion, zona, manzana
-> Seq Scan on tbl_censo_poblacion_1993 (cost=0.00..328346.76
rows=2514911 width=27) (actual time=0.00. .189570.00 rows=2553015 loops=1)
Filter: ((dubicacion <> ''::character varying) AND (zona
<> ''::character varying) AND (manzana <> ''::character varying)) Total
runtime: 279494.00 msec (7 rows)
So...can anyone suggest to me what I could do? I'm using PostgreSQL 7.3.4-2
in Cygwin on a WinXP platform.

As an aside, is there any way to increase the amount of memory allocated to
the postmaster.exe process? It seems to me that if I could add more than 4MB
that it has by default, then maybe that could increase the performance.

Any help is appreciated.

Kind regards,
Mike

----------------------------------------
This mail sent through www.mywaterloo.ca

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Nov 12 '05 #1

Subscribe Post Reply

1676

Martijn van Oosterhout

In general, the <> operator can't be indexed since it will match most rows,
in which case you're better off scanning the table anyway.

If you really run that query often, perhaps the answer is a partial index.

On Tue, Oct 14, 2003 at 09:57:05PM -0400, Mike Leahy wrote:

Hello all,

I have a question related to the use of indexes. One of my tables is part of
a census, with over 2.5 million records in it. Clearly, just about any query
takes a rather long time to complete. I was hoping to reduce the amount of
time by using indexes. However, no matter what I do, PostgreSQL never seems
to use them. It would seem to make sense that if I calculate something
grouped by a set of variables that have an index created for them, that the
index should be used rather than a sequential search, but it doesn't. I have
a table with an index such as the one created below:

CREATE INDEX tbl_censo_poblacion_1993_manzana_idx ON tbl_censo_poblacion_1993
( dubicacion, zona, manzana );

Then I try a query such as:

select count(*) as POBLACION, (dubicacion || zona || manzana) as COD_MANZANA
from tbl_censo_poblacion_1993 group by dubicacion, zona, manzana;

The results from explain indicate that a sequential scan is used (as far as I
can tell). I tried adding where statement:

select count(*) as POBLACION, (dubicacion || zona || manzana) as COD_MANZANA
from tbl_censo_poblacion_1993 where dubicacion <> '' and zona <> '' and
manzana <> '' group by dubicacion, zona, manzana;

The EXPLAIN analysis results appear as follows:

Aggregate (cost=847587.90..879024.28 rows=251491 width=27) (actual
time=272782.00..279458.00 rows=21459 loops=1)
-> Group (cost=847587.90..872737.01 rows=2514911 width=27) (actual
time=272782.00..278546.00 rows=2553015 loops=1)
-> Sort (cost=847587.90..853875.18 rows=2514911 width=27) (actual
time=272782.00..274533.00 rows=2553015 loop
s=1)
Sort Key: dubicacion, zona, manzana
-> Seq Scan on tbl_censo_poblacion_1993 (cost=0.00..328346.76
rows=2514911 width=27) (actual time=0.00. .189570.00 rows=2553015loops=1)
Filter: ((dubicacion <> ''::character varying) AND (zona
<> ''::character varying) AND (manzana <> ''::character varying)) Total
runtime: 279494.00 msec (7 rows)

So...can anyone suggest to me what I could do? I'm using PostgreSQL 7.3.4-2
in Cygwin on a WinXP platform.

As an aside, is there any way to increase the amount of memory allocated to
the postmaster.exe process? It seems to me that if I could add more than4MB
that it has by default, then maybe that could increase the performance.

Any help is appreciated.

Kind regards,
Mike

----------------------------------------
This mail sent through www.mywaterloo.ca

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster
--
Martijn van Oosterhout <kl*****@svana.org> http://svana.org/kleptog/ "All that is needed for the forces of evil to triumph is for enough good
men to do nothing." - Edmond Burke
"The penalty good people pay for not being interested in politics is to be
governed by people worse than themselves." - Plato

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (GNU/Linux)
Comment: For info see http://www.gnupg.org

iD8DBQE/jLC9Y5Twig3Ge+YRAqQcAKCJbf7jB41WICb2U2bTWqnx8xTqWQ CfZ2AT
yCU6OUWQx5KVdd/TMPdbbnk=
=X0n7
-----END PGP SIGNATURE-----

Nov 12 '05 #2

Stephan Szabo

On Tue, 14 Oct 2003, Mike Leahy wrote:

CREATE INDEX tbl_censo_poblacion_1993_manzana_idx ON tbl_censo_poblacion_1993
( dubicacion, zona, manzana );

Then I try a query such as:

select count(*) as POBLACION, (dubicacion || zona || manzana) as COD_MANZANA
from tbl_censo_poblacion_1993 group by dubicacion, zona, manzana;

The results from explain indicate that a sequential scan is used (as far as I
can tell). I tried adding where statement:

select count(*) as POBLACION, (dubicacion || zona || manzana) as COD_MANZANA
from tbl_censo_poblacion_1993 where dubicacion <> '' and zona <> '' and
manzana <> '' group by dubicacion, zona, manzana;

The EXPLAIN analysis results appear as follows:

Aggregate (cost=847587.90..879024.28 rows=251491 width=27) (actual
time=272782.00..279458.00 rows=21459 loops=1)
-> Group (cost=847587.90..872737.01 rows=2514911 width=27) (actual
time=272782.00..278546.00 rows=2553015 loops=1)
-> Sort (cost=847587.90..853875.18 rows=2514911 width=27) (actual
time=272782.00..274533.00 rows=2553015 loop
s=1)
Sort Key: dubicacion, zona, manzana
-> Seq Scan on tbl_censo_poblacion_1993 (cost=0.00..328346.76
rows=2514911 width=27) (actual time=0.00. .189570.00 rows=2553015 loops=1)
Note how many rows it's estimating (and actually) getting matches for.
It appears to be matching nearly every row so indexes aren't going to
really help much here for PostgreSQL since it still has to go back to the
actual table to find out whether or not the row is visible to your
transaction.

What does vacuum verbose tbl_censo_poblacion_1993; say? Specifically, how
many pages does the table have?
As an aside, is there any way to increase the amount of memory allocated to
the postmaster.exe process? It seems to me that if I could add more than 4MB
that it has by default, then maybe that could increase the performance.

You might want to raise both shared_buffers and sort_mem from their
default values in postgresql.conf.

For more information you should probably read:
http://www.varlena.com/varlena/Gener...bits/perf.html
http://www.varlena.com/varlena/Gener...ed_conf_e.html

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Nov 12 '05 #3

Mike Leahy

Stephan, thanks for the response.

I increased those variables you suggested, and that seems to have increased
the memory allocated to the process in windows.

Also, I tried the same query I was using, but with some actual values
specified in the where statement - that got it to use the indexes. The only
thing is, I would normally be joining such a statement to another table, in
which case there wouldn't be a where statement. I don't think that it uses
indexes in that case, even if the number of rows being used are a fraction of
what's in the table.

Regarding the vacuum results, here they are:

INFO: --Relation public.tbl_censo_poblacion_1993--
INFO: Pages 283669: Changed 0, Empty 0; Tup 2553015: Vac 0, Keep 0, UnUsed 0.
150106 Total CPU 5.89s/2.90u sec elapsed 56.52 sec.
VACUUM6

What do you make of these results?

Thanks for your help.

Mike

-----Original Message-----
From: Stephan Szabo [mailto:ss****@megazone.bigpanda.com]
Sent: October 15, 2003 12:11 AM
To: Mike Leahy
Cc: pg***********@postgresql.org
Subject: Re: [GENERAL] Indexes?

On Tue, 14 Oct 2003, Mike Leahy wrote:

CREATE INDEX tbl_censo_poblacion_1993_manzana_idx ON tbl_censo_poblacion_1993
( dubicacion, zona, manzana );

Then I try a query such as:

select count(*) as POBLACION, (dubicacion || zona || manzana) as COD_MANZANA
from tbl_censo_poblacion_1993 group by dubicacion, zona, manzana;

The results from explain indicate that a sequential scan is used (as far as I
can tell). I tried adding where statement:

select count(*) as POBLACION, (dubicacion || zona || manzana) as COD_MANZANA
from tbl_censo_poblacion_1993 where dubicacion <> '' and zona <> '' and
manzana <> '' group by dubicacion, zona, manzana;

The EXPLAIN analysis results appear as follows:

Aggregate (cost=847587.90..879024.28 rows=251491 width=27) (actual
time=272782.00..279458.00 rows=21459 loops=1)
-> Group (cost=847587.90..872737.01 rows=2514911 width=27) (actual
time=272782.00..278546.00 rows=2553015 loops=1)
-> Sort (cost=847587.90..853875.18 rows=2514911 width=27) (actual
time=272782.00..274533.00 rows=2553015 loop
s=1)
Sort Key: dubicacion, zona, manzana
-> Seq Scan on tbl_censo_poblacion_1993 (cost=0.00..328346.76 rows=2514911 width=27) (actual time=0.00. .189570.00 rows=2553015 loops=1)
Note how many rows it's estimating (and actually) getting matches for.
It appears to be matching nearly every row so indexes aren't going to
really help much here for PostgreSQL since it still has to go back to the
actual table to find out whether or not the row is visible to your
transaction.

What does vacuum verbose tbl_censo_poblacion_1993; say? Specifically, how
many pages does the table have?
As an aside, is there any way to increase the amount of memory allocated to
the postmaster.exe process? It seems to me that if I could add more than 4MB
that it has by default, then maybe that could increase the performance.

You might want to raise both shared_buffers and sort_mem from their
default values in postgresql.conf.

For more information you should probably read:
http://www.varlena.com/varlena/Gener...bits/perf.html
http://www.varlena.com/varlena/Gener...ed_conf_e.html
----------------------------------------
This mail sent through www.mywaterloo.ca

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Nov 12 '05 #4

Stephan Szabo

On Wed, 15 Oct 2003, Mike Leahy wrote:

I increased those variables you suggested, and that seems to have increased
the memory allocated to the process in windows.

Also, I tried the same query I was using, but with some actual values
specified in the where statement - that got it to use the indexes. The only
thing is, I would normally be joining such a statement to another table, in
which case there wouldn't be a where statement. I don't think that it uses
indexes in that case, even if the number of rows being used are a fraction of
what's in the table.
You'll need to try it and give explain (analyze) results because joins
give their own chances for indexes to be used, so I wouldn't want to
speculate without more info.

One other thing to try is to set enable_seqscan=off; before running the
explain analyze and compare the results to when you haven't done the set.
That gives a large cost disbenefit to choosing a seqscan. In the original
query it's entirely possible that this will be more expensive. If it
isn't (or it's really close), you may want to also look into lowering
random_page_cost in the postgresql.conf file.
Regarding the vacuum results, here they are:

INFO: --Relation public.tbl_censo_poblacion_1993--
INFO: Pages 283669: Changed 0, Empty 0; Tup 2553015: Vac 0, Keep 0, UnUsed 0.
150106 Total CPU 5.89s/2.90u sec elapsed 56.52 sec.
VACUUM6

What do you make of these results?

Well, that'd imply that the records are about 1k a piece on average, or
you have dead space in the table. I'm still thinking that 189s to read
284k pages is a bit much (about 1.5x the time on an equivalent number of
pages on my not optimized dev box), but I don't know how good your
hardware is and you are running in cygwin which probably doesn't help.
---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Nov 12 '05 #5

Similar topics

Question about Partitions and Indexes

by: Paul | last post by:

I'm confused about something I did this week. I have an 8.1.7.4 database with tables partitioned by dates. I recently dropped my maxvalue tablespace, added a few more partitions for up-coming dates,...

Oracle Database

Indexes on Bulk Insert data

by: Chris | last post by:

Any help would be appreciated. I am running a script that does the following in succession. 1-Drop existing database and create new database 2-Defines tables, stored procedures and functions...

Microsoft SQL Server

Don't use clustered indexes?

by: Steve_CA | last post by:

Hi, The more I read, the more confused I'm getting ! (no wonder they say ignorance is bliss) I just got back from the bookstore and was flipping through some SQL Server Administration...

Microsoft SQL Server

Optimizator and indexes

by: Igor | last post by:

Is there a way to force optimizer to use indexes without hints? (some server setting or index type...) I'll give an example to clarify : I have a table with fields Customer_Code char(10) not...

Microsoft SQL Server

question about performance issues w/SQL2000 with NO indexes

by: pheonix1t | last post by:

hello, I've been assigned to do performance tuning on an SQL2000 database (around 10GB in size, several instances). So far, I see a single RAID5 array, 4CPU (xeon 700MHZ), 4GB RAM. I see the...

Microsoft SQL Server

Size of indexes in a particular tablespace?

by: mairhtin o'feannag | last post by:

Hello, I want to ask the question "how large is each of the indexes in a particular tablespace??" since I want to know which user-created indexes are taking up huge amounts of space. I used the...

DB2 Database

How to get around too many indexes??

by: Jeff | last post by:

This is the first time that I remember ever having too many indexes on a table, but it has happened. I have en employees table and store in many places, on tables, the id of the employee that...

Microsoft Access / VBA

sqlserver 2005: indexes on raid-0?

by: boa | last post by:

I'm currently planning disk layouts and use for a new version of our database. The current version has all data and indexes in the default filegroup, placed on one big raid-5 array(6 drives) along...

Microsoft SQL Server

access97, more than 32 indexes on a table, time to move to MSDE / sql

by: lesperancer | last post by:

you start with a small application in access97, then you have more modules and more... and you reach the point where tables like 'item' and 'employee' reach the limit and you know there's more...

Microsoft Access / VBA

Indexes in Microsoft Access

by: MMcCarthy | last post by:

The more data you include in your tables the more you will need to have indexes to search and sort that data. However, there is a balance between having enough indexes and too many. Too many...

Microsoft Access / VBA

Basic Javascript concepts

by: aa123db | last post by:

Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...

Javascript

Batch import of multiple excel files into the database

by: ryjfgjl | last post by:

If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...

Data Management

Merging data from multiple Excel files

by: ryjfgjl | last post by:

In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...

Data Management

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing