By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
438,852 Members | 2,200 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 438,852 IT Pros & Developers. It's quick & easy.

slow query performance

P: n/a
I'm having severe performance issues with a conceptually simple
database. The database has one table, containing weather observations.
The table currently has about 13.5 million rows, and is being updated
constantly. (The database is running on a dual 550MHz PIII with 512MB RAM.
I have PostgreSQL 7.1.3 on i686-pc-linux-gnu, compiled by GCC 2.96
on RedHat 7.2)

On the whole, queries are of the form:

SELECT ? FROM obs WHERE station = ?
AND valid_time < ? AND valid_time > ?
or:
SELECT ? FROM obs WHERE station IN (?, ?, ...)
AND valid_time < ? AND valid_time > ?

Queries like these are taking around 4 to 5 minutes each, which seems
excessively slow to me (or are my expectations far too optimistic?).

For instance:
SELECT station, air_temp FROM obs
WHERE station = 'EGBB'
AND valid_time > '28/8/03 00:00'
AND valid_time < '28/10/03 00:00'

takes 4 mins 32 secs.

An EXPLAIN of the above query says:
NOTICE: QUERY PLAN:

Index Scan using obs_pkey on obs (cost=0.00..9.01 rows=1 width=20)

A simple "SELECT count(*) from obs" query takes around that sort of time
too.

I have run "vacuumdb --analyze obs", to little effect.

How can I speed this up? Where am I going wrong? Is there a problem with
the table structure, or the indexes? Does the continual updating of the
database (at the rate of somewhere between 1-3 entries per second) cause
problems?

The table and indexes are defined as follows:
Table "obs"
Attribute | Type | Modifier
----------------------------+--------------------------+----------
valid_time | timestamp with time zone |
metar_air_temp | double precision |
relative_humidity | double precision |
pressure_change | double precision |
ceiling | double precision |
metar_dew_point | double precision |
metar_gusts | double precision |
wet_bulb_temperature | double precision |
past_weather | text |
visibility | double precision |
metar_visibility | double precision |
precipitation | double precision |
station | character(10) |
pressure_msl | double precision |
metar_min_temperature_6hr | double precision |
precipitation_period | double precision |
metar_wet_bulb | double precision |
saturation_mixing_ratio | double precision |
metar_pressure | double precision |
metar_sky_cover | text |
dew_point | double precision |
wind_direction | double precision |
actual_time | timestamp with time zone |
gust_speed | double precision |
high_cloud_type | text |
precipitation_24hr | double precision |
metar_precipitation_24hr | double precision |
pressure_tendency | text |
metar_relative_humidity | double precision |
low_cloud_type | text |
metar_max_temperature_6hr | double precision |
middle_cloud_type | text |
air_temp | double precision |
low_and_middle_cloud_cover | text |
metar_wind_dir | double precision |
metar_weather | text |
snow_depth | double precision |
metar_snow_depth | double precision |
min_temp_12hr | double precision |
present_weather | text |
wind_speed | double precision |
snow_cover | text |
metar_wind_speed | double precision |
metar_ceiling | double precision |
max_temp_12hr | double precision |
mixing_ratio | double precision |
pressure_change_3hr | double precision |
total_cloud | integer |
max_temp_24hr | double precision |
min_temp_24hr | double precision |
snow_amount_6hr | double precision |
Indices: obs_pkey,
obs_station,
obs_valid_time

Index "obs_pkey"
Attribute | Type
------------+--------------------------
valid_time | timestamp with time zone
station | character(10)
unique btree

Index "obs_station"
Attribute | Type
-----------+---------------
station | character(10)
btree

Index "obs_valid_time"
Attribute | Type
------------+--------------------------
valid_time | timestamp with time zone
btree

(I suspect the obs_valid_time index is redundant, because of the
obs_pkey index - is that right?)

I'd be grateful for any advice and any clues to help speed this up.
Many thanks,
Dave

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ma*******@postgresql.org so that your
message can get through to the mailing list cleanly

Nov 12 '05 #1
Share this Question
Share on Google+
16 Replies


P: n/a
Dave Weaver wrote:
too optimistic?).

For instance:
SELECT station, air_temp FROM obs
WHERE station = 'EGBB'
AND valid_time > '28/8/03 00:00'
AND valid_time < '28/10/03 00:00'

takes 4 mins 32 secs.

An EXPLAIN of the above query says:
NOTICE: QUERY PLAN:

Index Scan using obs_pkey on obs (cost=0.00..9.01 rows=1 width=20)

A simple "SELECT count(*) from obs" query takes around that sort of time
too.
That is no test. Postgresql will always fetch the entire table.
I have run "vacuumdb --analyze obs", to little effect.


Check http://www.varlena.com/varlena/Gener...bits/perf.html for general
tuning tips and do a vacuum full if table is updated frequently. Also reindex
the indexes after vacuum. It will take quite some time though.

See if that helps.

Shridhar
---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Nov 12 '05 #2

P: n/a
For instance:
SELECT station, air_temp FROM obs
WHERE station = 'EGBB'
AND valid_time > '28/8/03 00:00'
AND valid_time < '28/10/03 00:00'

takes 4 mins 32 secs.


How many rows should that return?
[explain analyze will tell you that]

and while that runs is your disk thrashing? vmstat's bi/bo columns will
tell you.

7.1 is quite old, but I do understand the pain of migraing to 7.3 [or
..4beta] with huge db's

--
Jeff Trout <je**@jefftrout.com>
http://www.jefftrout.com/
http://www.stuarthamm.net/

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Nov 12 '05 #3

P: n/a
Jeff wrote:
Dave Weaver wrote:
For instance:
SELECT station, air_temp FROM obs
WHERE station = 'EGBB'
AND valid_time > '28/8/03 00:00'
AND valid_time < '28/10/03 00:00'

takes 4 mins 32 secs.
How many rows should that return?
[explain analyze will tell you that]


"explain analyze" doesn't seem to be part of this postgres version
(or I misunderstood something).
That particular query returned 24 rows.

and while that runs is your disk thrashing? vmstat's bi/bo columns will
tell you.
The machine's over the other side of the building, so I can't physically
see if the disk is thrashing.
I'm not sure how to interpret the vmstat output; running "vmstat 1" shows
me bi/bo both at zero (mostly) until I start the query. Then bi shoots up
to around 2500 (bo remains around zero) until the query finishes.

7.1 is quite old, but I do understand the pain of migraing to 7.3 [or
.4beta] with huge db's


Is the upgrade likely to make a difference?
I'm still none-the-wiser wether the problem I have is due to:
1 Postgres version
2 Database size
3 Table structure
4 Configuration issues
5 Slow hardware
6 All of the above
7 None of the above
8 Something else

Thanks for the help,
Dave.

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Nov 12 '05 #4

P: n/a
On Thu, 30 Oct 2003 13:49:46 -0000
"Dave Weaver" <da***@wsieurope.com> wrote:
Jeff wrote:
Dave Weaver wrote:
For instance:
SELECT station, air_temp FROM obs
WHERE station = 'EGBB'
AND valid_time > '28/8/03 00:00'
AND valid_time < '28/10/03 00:00'

takes 4 mins 32 secs.
How many rows should that return?
[explain analyze will tell you that]


"explain analyze" doesn't seem to be part of this postgres version
(or I misunderstood something).
That particular query returned 24 rows.

You run explain analyze [insert query here]

Post that output


The machine's over the other side of the building, so I can't
physically see if the disk is thrashing.
I'm not sure how to interpret the vmstat output; running "vmstat 1"
shows me bi/bo both at zero (mostly) until I start the query. Then bi
shoots up to around 2500 (bo remains around zero) until the query
finishes.


The BI column means it is reading 2500 blocks / second. This is
typically kB/sec (Linux defaults to 1kB block size on filesystems)

That seems pretty low.. even for an older disk.
We'll need the explain analyze output to help further.
--
Jeff Trout <je**@jefftrout.com>
http://www.jefftrout.com/
http://www.stuarthamm.net/

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Nov 12 '05 #5

P: n/a
Jeff Wrote:
"Dave Weaver" <da***@wsieurope.com> wrote:

"explain analyze" doesn't seem to be part of this postgres version (or
I misunderstood something).
That particular query returned 24 rows.

You run explain analyze [insert query here]

Post that output


The output is probably not what you were expecting! :-)

obs=> explain analyze select * from obs where station = 'EGBB'
obs-> and valid_time > '28/8/03 00:00' and valid_time < '28/10/03 00:00';
ERROR: parser: parse error at or near "analyze"
obs=> \h explain
Command: EXPLAIN
Description: Shows statement execution plan
Syntax:
EXPLAIN [ VERBOSE ] query

As I said, "explain analyze" doesn't seem to be part of this postgres
version.

Dave.

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Nov 12 '05 #6

P: n/a
On Thu, 30 Oct 2003, Dave Weaver wrote:
Jeff wrote:
Dave Weaver wrote:
For instance:
SELECT station, air_temp FROM obs
WHERE station = 'EGBB'
AND valid_time > '28/8/03 00:00'
AND valid_time < '28/10/03 00:00'

takes 4 mins 32 secs.


How many rows should that return?
[explain analyze will tell you that]


"explain analyze" doesn't seem to be part of this postgres version
(or I misunderstood something).
That particular query returned 24 rows.


Back then it was just explain. explain analyze actually runs the query
and tells you how long each thing too etc... i.e. it gives you the "I
imagine I'll get this many rows back and it'll cost this much" part, then
the cold hard facts of how many rows really came back, and how long it
really too. Quite a nice improvement.
and while that runs is your disk thrashing? vmstat's bi/bo columns will
tell you.


The machine's over the other side of the building, so I can't physically
see if the disk is thrashing.
I'm not sure how to interpret the vmstat output; running "vmstat 1" shows
me bi/bo both at zero (mostly) until I start the query. Then bi shoots up
to around 2500 (bo remains around zero) until the query finishes.


Your disk is likely trashing.

Can you set sort_mem on that old version of pgsql to something higher?

set sort_mem = 32768;

or something similar?
7.1 is quite old, but I do understand the pain of migraing to 7.3 [or
.4beta] with huge db's


Is the upgrade likely to make a difference?
I'm still none-the-wiser wether the problem I have is due to:
1 Postgres version
2 Database size
3 Table structure
4 Configuration issues
5 Slow hardware
6 All of the above
7 None of the above
8 Something else


Yes, the upgrade is very likely to make a difference. The average
performance gain for each version since then has been, in my experience,
anywhere from a few percentage points faster to many times faster,
depending on what you were trying to do.

Why not download 7.4beta5 and see if you can get it to import the data
from 7.1.3? It's close to going production, and in my opinion, 7.4beta5
is probably at least as stable as 7.1.3 considering the number of unfixed
bugs likely to be hiding out there. My guess is that you'll find your
workstation running 74beta5 with one IDE hard drive outrunning your server
with 7.1.3 on it. Seriously.

We're running 7.2.4 where I work, and the change from 7.1 to 7.2 was huge
for us, especially the non-full vacuums.
---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to ma*******@postgresql.org

Nov 12 '05 #7

P: n/a

On Thu, 30 Oct 2003, Dave Weaver wrote:
Jeff wrote:
Dave Weaver wrote:
For instance:
SELECT station, air_temp FROM obs
WHERE station = 'EGBB'
AND valid_time > '28/8/03 00:00'
AND valid_time < '28/10/03 00:00'

takes 4 mins 32 secs.


How many rows should that return?
[explain analyze will tell you that]


"explain analyze" doesn't seem to be part of this postgres version
(or I misunderstood something).
That particular query returned 24 rows.

and while that runs is your disk thrashing? vmstat's bi/bo columns will
tell you.


The machine's over the other side of the building, so I can't physically
see if the disk is thrashing.
I'm not sure how to interpret the vmstat output; running "vmstat 1" shows
me bi/bo both at zero (mostly) until I start the query. Then bi shoots up
to around 2500 (bo remains around zero) until the query finishes.

7.1 is quite old, but I do understand the pain of migraing to 7.3 [or
.4beta] with huge db's


Is the upgrade likely to make a difference?


Well, it's likely to get you better help. Explain Analyze (added in 7.2
IIRC) gets us information on the real time spent in operations as well as
the real number of rows.

But, back to the question, what is the definition of the index it's using?
If you don't have already have an index on (station,valid_time does
making one help?

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Nov 12 '05 #8

P: n/a
"Dave Weaver" <da***@wsieurope.com> writes:
Is the upgrade likely to make a difference?


I'm not sure if it would help for this specific query, but in general
each major PG release has useful performance improvements over the
previous one.

What I'm wondering about is an index-bloat problem (see the
pgsql-performance archives for discussions). Do you do a lot of updates
or deletes on this table, or is it just inserts? What is the physical
size of the table and its index? The output of VACUUM VERBOSE for this
table would be useful to show.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Nov 12 '05 #9

P: n/a
Tom Lane wrote:
Do you do a lot of updates or deletes on this table, or is it just
inserts?
Inserts and updates. No deletes.

What is the physical size of the table and its index?
How do I find out this information?

The output of VACUUM VERBOSE for this table would be useful to show.


obs=> vacuum verbose obs;
NOTICE: --Relation obs--
NOTICE: Pages 276896: Changed 2776, reaped 67000, Empty 0, New 0;s
Tup 13739326: Vac 78031, Keep/VTL 3141/3141, Crash 0, UnUsed 303993,
MinLen 72, MaxLen 476; Re-using: Free/Avail. Space 16174372/14995020;
EndEmpty/Avail. Pages 0/18004. CPU 26.11s/3.78u sec.
NOTICE: Index obs_pkey: Pages 114870; Tuples 13739326: Deleted 37445. CPU
12.33s/39.86u sec.
NOTICE: Index obs_valid_time: Pages 45713; Tuples 13739326: Deleted 37445.
CPU 4.38s/37.65u sec.
NOTICE: InvalidateSharedInvalid: cache state reset
NOTICE: Index obs_station: Pages 53170; Tuples 13739326: Deleted 37445. CPU
6.46s/56.63u sec.
NOTICE: Rel obs: Pages: 276896 --> 275200; Tuple(s) moved: 30899. CPU
33.94s/51.05u sec.
NOTICE: Index obs_pkey: Pages 114962; Tuples 13739326: Deleted 30881. CPU
13.24s/19.80u sec.
NOTICE: Index obs_valid_time: Pages 45819; Tuples 13739326: Deleted 30881.
CPU 4.51s/17.42u sec.
NOTICE: Index obs_station: Pages 53238; Tuples 13739326: Deleted 30881. CPU
5.78s/18.33u sec.
NOTICE: --Relation pg_toast_503832058--
NOTICE: Pages 0: Changed 0, reaped 0, Empty 0, New 0; Tup 0: Vac 0,
Keep/VTL 0/0, Crash 0, UnUsed 0, MinLen 0, MaxLen 0; Re-using: Free/Avail.
Space 0/0; EndEmpty/Avail. Pages 0/0. CPU 0.00s/0.00u sec.
NOTICE: Index pg_toast_503832058_idx: Pages 1; Tuples 0. CPU 0.00s/0.00u
sec.
VACUUM
obs=>
Cheers,
Dave.

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Nov 12 '05 #10

P: n/a
Stephan Szabo wrote:
But, back to the question, what is the definition of the index it's using?
If you don't have already have an index on (station,valid_time does
making one help?


From my original post:

Index "obs_pkey"
Attribute | Type
------------+--------------------------
valid_time | timestamp with time zone
station | character(10)
unique btree

Index "obs_station"
Attribute | Type
-----------+---------------
station | character(10)
btree

Index "obs_valid_time"
Attribute | Type
------------+--------------------------
valid_time | timestamp with time zone
btree

(I suspect the obs_valid_time index is redundant, because of the obs_pkey
index - is that right?)

Cheers,
Dave.

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Nov 12 '05 #11

P: n/a
Dave Weaver wrote:
Tom Lane wrote:
Do you do a lot of updates or deletes on this table, or is it just
inserts?

Inserts and updates. No deletes.


Updates are insert/deletes under postgresql as it does not updates rows in place.
What is the physical size of the table and its index?

How do I find out this information?


cd $PGDATA;du -h

This will give you size of each directory. Using utility oid2name in contrib
module in sources, you can find out what object is stored in which file. Same is
true for data files under it as well.
obs=> vacuum verbose obs;
NOTICE: --Relation obs--
NOTICE: Pages 276896: Changed 2776, reaped 67000, Empty 0, New 0;s
Tup 13739326: Vac 78031, Keep/VTL 3141/3141, Crash 0, UnUsed 303993,
MinLen 72, MaxLen 476; Re-using: Free/Avail. Space 16174372/14995020;
EndEmpty/Avail. Pages 0/18004. CPU 26.11s/3.78u sec.
NOTICE: Index obs_pkey: Pages 114870; Tuples 13739326: Deleted 37445. CPU
12.33s/39.86u sec.
NOTICE: Index obs_valid_time: Pages 45713; Tuples 13739326: Deleted 37445.
CPU 4.38s/37.65u sec.
NOTICE: InvalidateSharedInvalid: cache state reset
NOTICE: Index obs_station: Pages 53170; Tuples 13739326: Deleted 37445. CPU
6.46s/56.63u sec.
NOTICE: Rel obs: Pages: 276896 --> 275200; Tuple(s) moved: 30899. CPU
33.94s/51.05u sec.
NOTICE: Index obs_pkey: Pages 114962; Tuples 13739326: Deleted 30881. CPU
13.24s/19.80u sec.
NOTICE: Index obs_valid_time: Pages 45819; Tuples 13739326: Deleted 30881.
CPU 4.51s/17.42u sec.
NOTICE: Index obs_station: Pages 53238; Tuples 13739326: Deleted 30881. CPU
5.78s/18.33u sec.
NOTICE: --Relation pg_toast_503832058--
NOTICE: Pages 0: Changed 0, reaped 0, Empty 0, New 0; Tup 0: Vac 0,
Keep/VTL 0/0, Crash 0, UnUsed 0, MinLen 0, MaxLen 0; Re-using: Free/Avail.
Space 0/0; EndEmpty/Avail. Pages 0/0. CPU 0.00s/0.00u sec.
NOTICE: Index pg_toast_503832058_idx: Pages 1; Tuples 0. CPU 0.00s/0.00u
sec.
VACUUM


You reindexed? Seems like this is after you have already run a vacuum. So not
much change is shown here.

HTH

Shridhar
---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to ma*******@postgresql.org

Nov 12 '05 #12

P: n/a
Dave,

Apologies if this has been suggested before, but maybe :

- interchanging the key order for the "obs_pkey" index and
- clustering the "obs" table on "station"

might make these queries go a bit better?

Alternatively if maintaining a cluster on station is infeasable, you
could consider a collection of partial indexes on valid_time for each
station:

create index obs_valid_time _stat1 on obs(valid_time) where station =
'station 1';

(etc for each station)...

regards

Mark
Dave Weaver wrote:
On the whole, queries are of the form:

SELECT ? FROM obs WHERE station = ?
AND valid_time < ? AND valid_time > ?
or:
SELECT ? FROM obs WHERE station IN (?, ?, ...)
AND valid_time < ? AND valid_time > ?

An EXPLAIN of the above query says:
NOTICE: QUERY PLAN:

Index Scan using obs_pkey on obs (cost=0.00..9.01 rows=1 width=20)

Index "obs_pkey"
Attribute | Type
------------+--------------------------
valid_time | timestamp with time zone
station | character(10)
unique btree

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Nov 12 '05 #13

P: n/a
Mark wrote:
- interchanging the key order for the "obs_pkey" index and
I will try this.
- clustering the "obs" table on "station"
Sorry, I don't understand what you mean by this - can you explain?

create index obs_valid_time _stat1 on obs(valid_time) where station = 'station 1';


Interesting. I didn't know you could do that (I have a lot to learn!).
There are about 13500 stations, so that's a lot of indexes.

Hang on - just tried it and I get a 'parse error at or near "where"', so
I suspect that's yet another reason to upgrade...

I think the one message that's come across loud and clear from this
thread is "Upgrade!", so I guess that's what I'll be doing!

Many thanks to all who have contributed to this thread so far.
I really appreciate al hte help.

Cheers,
Dave.
---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Nov 12 '05 #14

P: n/a
"Dave Weaver" <da***@wsieurope.com> writes:
The output of VACUUM VERBOSE for this table would be useful to show.
NOTICE: Index obs_pkey: Pages 114962; Tuples 13739326: Deleted 30881. CPU
13.24s/19.80u sec.


Lets see, 114962 pages at 8K apiece, divided by 13739326 entries, gives
about 68 bytes per index entry. The entries themselves (contents a
timestamp and a char(10)) take 8 + 4 + 10 bytes for data, plus 8 bytes
for the index tuple header, plus 2 bytes wasted for alignment (assuming
this is Intel hardware), plus a 4-byte line pointer; 36 bytes total.
So you have an index loading factor of about 52%, which is noticeably
less than the theoretical optimum of 70%, though not really bad yet.
I think you are seeing some index bloat --- especially if you
recently reindexed, meaning that the index hasn't had very long to
grow. You could try keeping an eye on the size of obs_pkey over time
and see if it grows faster than the table itself.

If you are going to upgrade I'd counsel going to 7.4, which should solve
or at least greatly reduce the problem of index bloat. See the
pgsql-performance archives for more discussion.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Nov 12 '05 #15

P: n/a

Dave Weaver wrote:
- clustering the "obs" table on "station"


Sorry, I don't understand what you mean by this - can you explain?

Supposing obs_pkey is on (station, valid_time):

cluster obs_pkey on obs

will re-order the rows in obs based on the index obs_pkey. (This is
clustering on 'station' and 'valid_time', to do just station you could
use an index on just 'station').

The down side is that the row ordering is slowly lost as rows are
updated, so periodic running of the cluster command is needed - this is
a pain as it will take a while for 13 million row table.

regards

Mark
---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Nov 12 '05 #16

P: n/a
On Sat, Nov 01, 2003 at 10:18:14AM +1300, Mark Kirkwood wrote:

Dave Weaver wrote:
- clustering the "obs" table on "station"
Sorry, I don't understand what you mean by this - can you explain?

Supposing obs_pkey is on (station, valid_time):

cluster obs_pkey on obs


Be aware that doing this on the 7.1.3 version you are running will drop
the other indexes on the table, and some other metadata about it (grants,
inheritance, foreign key relationships IIRC).

--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"Coge la flor que hoy nace alegre, ufana. ¿Quién sabe si nacera otra mañana?"

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Nov 12 '05 #17

This discussion thread is closed

Replies have been disabled for this discussion.