473,772 Members | 2,349 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

slow query performance

I'm having severe performance issues with a conceptually simple
database. The database has one table, containing weather observations.
The table currently has about 13.5 million rows, and is being updated
constantly. (The database is running on a dual 550MHz PIII with 512MB RAM.
I have PostgreSQL 7.1.3 on i686-pc-linux-gnu, compiled by GCC 2.96
on RedHat 7.2)

On the whole, queries are of the form:

SELECT ? FROM obs WHERE station = ?
AND valid_time < ? AND valid_time > ?
or:
SELECT ? FROM obs WHERE station IN (?, ?, ...)
AND valid_time < ? AND valid_time > ?

Queries like these are taking around 4 to 5 minutes each, which seems
excessively slow to me (or are my expectations far too optimistic?).

For instance:
SELECT station, air_temp FROM obs
WHERE station = 'EGBB'
AND valid_time > '28/8/03 00:00'
AND valid_time < '28/10/03 00:00'

takes 4 mins 32 secs.

An EXPLAIN of the above query says:
NOTICE: QUERY PLAN:

Index Scan using obs_pkey on obs (cost=0.00..9.0 1 rows=1 width=20)

A simple "SELECT count(*) from obs" query takes around that sort of time
too.

I have run "vacuumdb --analyze obs", to little effect.

How can I speed this up? Where am I going wrong? Is there a problem with
the table structure, or the indexes? Does the continual updating of the
database (at the rate of somewhere between 1-3 entries per second) cause
problems?

The table and indexes are defined as follows:
Table "obs"
Attribute | Type | Modifier
----------------------------+--------------------------+----------
valid_time | timestamp with time zone |
metar_air_temp | double precision |
relative_humidi ty | double precision |
pressure_change | double precision |
ceiling | double precision |
metar_dew_point | double precision |
metar_gusts | double precision |
wet_bulb_temper ature | double precision |
past_weather | text |
visibility | double precision |
metar_visibilit y | double precision |
precipitation | double precision |
station | character(10) |
pressure_msl | double precision |
metar_min_tempe rature_6hr | double precision |
precipitation_p eriod | double precision |
metar_wet_bulb | double precision |
saturation_mixi ng_ratio | double precision |
metar_pressure | double precision |
metar_sky_cover | text |
dew_point | double precision |
wind_direction | double precision |
actual_time | timestamp with time zone |
gust_speed | double precision |
high_cloud_type | text |
precipitation_2 4hr | double precision |
metar_precipita tion_24hr | double precision |
pressure_tenden cy | text |
metar_relative_ humidity | double precision |
low_cloud_type | text |
metar_max_tempe rature_6hr | double precision |
middle_cloud_ty pe | text |
air_temp | double precision |
low_and_middle_ cloud_cover | text |
metar_wind_dir | double precision |
metar_weather | text |
snow_depth | double precision |
metar_snow_dept h | double precision |
min_temp_12hr | double precision |
present_weather | text |
wind_speed | double precision |
snow_cover | text |
metar_wind_spee d | double precision |
metar_ceiling | double precision |
max_temp_12hr | double precision |
mixing_ratio | double precision |
pressure_change _3hr | double precision |
total_cloud | integer |
max_temp_24hr | double precision |
min_temp_24hr | double precision |
snow_amount_6hr | double precision |
Indices: obs_pkey,
obs_station,
obs_valid_time

Index "obs_pkey"
Attribute | Type
------------+--------------------------
valid_time | timestamp with time zone
station | character(10)
unique btree

Index "obs_statio n"
Attribute | Type
-----------+---------------
station | character(10)
btree

Index "obs_valid_time "
Attribute | Type
------------+--------------------------
valid_time | timestamp with time zone
btree

(I suspect the obs_valid_time index is redundant, because of the
obs_pkey index - is that right?)

I'd be grateful for any advice and any clues to help speed this up.
Many thanks,
Dave

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ma*******@postg resql.org so that your
message can get through to the mailing list cleanly

Nov 12 '05 #1
16 4215
Dave Weaver wrote:
too optimistic?).

For instance:
SELECT station, air_temp FROM obs
WHERE station = 'EGBB'
AND valid_time > '28/8/03 00:00'
AND valid_time < '28/10/03 00:00'

takes 4 mins 32 secs.

An EXPLAIN of the above query says:
NOTICE: QUERY PLAN:

Index Scan using obs_pkey on obs (cost=0.00..9.0 1 rows=1 width=20)

A simple "SELECT count(*) from obs" query takes around that sort of time
too.
That is no test. Postgresql will always fetch the entire table.
I have run "vacuumdb --analyze obs", to little effect.


Check http://www.varlena.com/varlena/Gener...bits/perf.html for general
tuning tips and do a vacuum full if table is updated frequently. Also reindex
the indexes after vacuum. It will take quite some time though.

See if that helps.

Shridhar
---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Nov 12 '05 #2
For instance:
SELECT station, air_temp FROM obs
WHERE station = 'EGBB'
AND valid_time > '28/8/03 00:00'
AND valid_time < '28/10/03 00:00'

takes 4 mins 32 secs.


How many rows should that return?
[explain analyze will tell you that]

and while that runs is your disk thrashing? vmstat's bi/bo columns will
tell you.

7.1 is quite old, but I do understand the pain of migraing to 7.3 [or
..4beta] with huge db's

--
Jeff Trout <je**@jefftrout .com>
http://www.jefftrout.com/
http://www.stuarthamm.net/

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Nov 12 '05 #3
Jeff wrote:
Dave Weaver wrote:
For instance:
SELECT station, air_temp FROM obs
WHERE station = 'EGBB'
AND valid_time > '28/8/03 00:00'
AND valid_time < '28/10/03 00:00'

takes 4 mins 32 secs.
How many rows should that return?
[explain analyze will tell you that]


"explain analyze" doesn't seem to be part of this postgres version
(or I misunderstood something).
That particular query returned 24 rows.

and while that runs is your disk thrashing? vmstat's bi/bo columns will
tell you.
The machine's over the other side of the building, so I can't physically
see if the disk is thrashing.
I'm not sure how to interpret the vmstat output; running "vmstat 1" shows
me bi/bo both at zero (mostly) until I start the query. Then bi shoots up
to around 2500 (bo remains around zero) until the query finishes.

7.1 is quite old, but I do understand the pain of migraing to 7.3 [or
.4beta] with huge db's


Is the upgrade likely to make a difference?
I'm still none-the-wiser wether the problem I have is due to:
1 Postgres version
2 Database size
3 Table structure
4 Configuration issues
5 Slow hardware
6 All of the above
7 None of the above
8 Something else

Thanks for the help,
Dave.

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Nov 12 '05 #4
On Thu, 30 Oct 2003 13:49:46 -0000
"Dave Weaver" <da***@wsieurop e.com> wrote:
Jeff wrote:
Dave Weaver wrote:
For instance:
SELECT station, air_temp FROM obs
WHERE station = 'EGBB'
AND valid_time > '28/8/03 00:00'
AND valid_time < '28/10/03 00:00'

takes 4 mins 32 secs.
How many rows should that return?
[explain analyze will tell you that]


"explain analyze" doesn't seem to be part of this postgres version
(or I misunderstood something).
That particular query returned 24 rows.

You run explain analyze [insert query here]

Post that output


The machine's over the other side of the building, so I can't
physically see if the disk is thrashing.
I'm not sure how to interpret the vmstat output; running "vmstat 1"
shows me bi/bo both at zero (mostly) until I start the query. Then bi
shoots up to around 2500 (bo remains around zero) until the query
finishes.


The BI column means it is reading 2500 blocks / second. This is
typically kB/sec (Linux defaults to 1kB block size on filesystems)

That seems pretty low.. even for an older disk.
We'll need the explain analyze output to help further.
--
Jeff Trout <je**@jefftrout .com>
http://www.jefftrout.com/
http://www.stuarthamm.net/

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Nov 12 '05 #5
Jeff Wrote:
"Dave Weaver" <da***@wsieurop e.com> wrote:

"explain analyze" doesn't seem to be part of this postgres version (or
I misunderstood something).
That particular query returned 24 rows.

You run explain analyze [insert query here]

Post that output


The output is probably not what you were expecting! :-)

obs=> explain analyze select * from obs where station = 'EGBB'
obs-> and valid_time > '28/8/03 00:00' and valid_time < '28/10/03 00:00';
ERROR: parser: parse error at or near "analyze"
obs=> \h explain
Command: EXPLAIN
Description: Shows statement execution plan
Syntax:
EXPLAIN [ VERBOSE ] query

As I said, "explain analyze" doesn't seem to be part of this postgres
version.

Dave.

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Nov 12 '05 #6
On Thu, 30 Oct 2003, Dave Weaver wrote:
Jeff wrote:
Dave Weaver wrote:
For instance:
SELECT station, air_temp FROM obs
WHERE station = 'EGBB'
AND valid_time > '28/8/03 00:00'
AND valid_time < '28/10/03 00:00'

takes 4 mins 32 secs.


How many rows should that return?
[explain analyze will tell you that]


"explain analyze" doesn't seem to be part of this postgres version
(or I misunderstood something).
That particular query returned 24 rows.


Back then it was just explain. explain analyze actually runs the query
and tells you how long each thing too etc... i.e. it gives you the "I
imagine I'll get this many rows back and it'll cost this much" part, then
the cold hard facts of how many rows really came back, and how long it
really too. Quite a nice improvement.
and while that runs is your disk thrashing? vmstat's bi/bo columns will
tell you.


The machine's over the other side of the building, so I can't physically
see if the disk is thrashing.
I'm not sure how to interpret the vmstat output; running "vmstat 1" shows
me bi/bo both at zero (mostly) until I start the query. Then bi shoots up
to around 2500 (bo remains around zero) until the query finishes.


Your disk is likely trashing.

Can you set sort_mem on that old version of pgsql to something higher?

set sort_mem = 32768;

or something similar?
7.1 is quite old, but I do understand the pain of migraing to 7.3 [or
.4beta] with huge db's


Is the upgrade likely to make a difference?
I'm still none-the-wiser wether the problem I have is due to:
1 Postgres version
2 Database size
3 Table structure
4 Configuration issues
5 Slow hardware
6 All of the above
7 None of the above
8 Something else


Yes, the upgrade is very likely to make a difference. The average
performance gain for each version since then has been, in my experience,
anywhere from a few percentage points faster to many times faster,
depending on what you were trying to do.

Why not download 7.4beta5 and see if you can get it to import the data
from 7.1.3? It's close to going production, and in my opinion, 7.4beta5
is probably at least as stable as 7.1.3 considering the number of unfixed
bugs likely to be hiding out there. My guess is that you'll find your
workstation running 74beta5 with one IDE hard drive outrunning your server
with 7.1.3 on it. Seriously.

We're running 7.2.4 where I work, and the change from 7.1 to 7.2 was huge
for us, especially the non-full vacuums.
---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to ma*******@postg resql.org

Nov 12 '05 #7

On Thu, 30 Oct 2003, Dave Weaver wrote:
Jeff wrote:
Dave Weaver wrote:
For instance:
SELECT station, air_temp FROM obs
WHERE station = 'EGBB'
AND valid_time > '28/8/03 00:00'
AND valid_time < '28/10/03 00:00'

takes 4 mins 32 secs.


How many rows should that return?
[explain analyze will tell you that]


"explain analyze" doesn't seem to be part of this postgres version
(or I misunderstood something).
That particular query returned 24 rows.

and while that runs is your disk thrashing? vmstat's bi/bo columns will
tell you.


The machine's over the other side of the building, so I can't physically
see if the disk is thrashing.
I'm not sure how to interpret the vmstat output; running "vmstat 1" shows
me bi/bo both at zero (mostly) until I start the query. Then bi shoots up
to around 2500 (bo remains around zero) until the query finishes.

7.1 is quite old, but I do understand the pain of migraing to 7.3 [or
.4beta] with huge db's


Is the upgrade likely to make a difference?


Well, it's likely to get you better help. Explain Analyze (added in 7.2
IIRC) gets us information on the real time spent in operations as well as
the real number of rows.

But, back to the question, what is the definition of the index it's using?
If you don't have already have an index on (station,valid_ time does
making one help?

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Nov 12 '05 #8
"Dave Weaver" <da***@wsieurop e.com> writes:
Is the upgrade likely to make a difference?


I'm not sure if it would help for this specific query, but in general
each major PG release has useful performance improvements over the
previous one.

What I'm wondering about is an index-bloat problem (see the
pgsql-performance archives for discussions). Do you do a lot of updates
or deletes on this table, or is it just inserts? What is the physical
size of the table and its index? The output of VACUUM VERBOSE for this
table would be useful to show.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Nov 12 '05 #9
Tom Lane wrote:
Do you do a lot of updates or deletes on this table, or is it just
inserts?
Inserts and updates. No deletes.

What is the physical size of the table and its index?
How do I find out this information?

The output of VACUUM VERBOSE for this table would be useful to show.


obs=> vacuum verbose obs;
NOTICE: --Relation obs--
NOTICE: Pages 276896: Changed 2776, reaped 67000, Empty 0, New 0;s
Tup 13739326: Vac 78031, Keep/VTL 3141/3141, Crash 0, UnUsed 303993,
MinLen 72, MaxLen 476; Re-using: Free/Avail. Space 16174372/14995020;
EndEmpty/Avail. Pages 0/18004. CPU 26.11s/3.78u sec.
NOTICE: Index obs_pkey: Pages 114870; Tuples 13739326: Deleted 37445. CPU
12.33s/39.86u sec.
NOTICE: Index obs_valid_time: Pages 45713; Tuples 13739326: Deleted 37445.
CPU 4.38s/37.65u sec.
NOTICE: InvalidateShare dInvalid: cache state reset
NOTICE: Index obs_station: Pages 53170; Tuples 13739326: Deleted 37445. CPU
6.46s/56.63u sec.
NOTICE: Rel obs: Pages: 276896 --> 275200; Tuple(s) moved: 30899. CPU
33.94s/51.05u sec.
NOTICE: Index obs_pkey: Pages 114962; Tuples 13739326: Deleted 30881. CPU
13.24s/19.80u sec.
NOTICE: Index obs_valid_time: Pages 45819; Tuples 13739326: Deleted 30881.
CPU 4.51s/17.42u sec.
NOTICE: Index obs_station: Pages 53238; Tuples 13739326: Deleted 30881. CPU
5.78s/18.33u sec.
NOTICE: --Relation pg_toast_503832 058--
NOTICE: Pages 0: Changed 0, reaped 0, Empty 0, New 0; Tup 0: Vac 0,
Keep/VTL 0/0, Crash 0, UnUsed 0, MinLen 0, MaxLen 0; Re-using: Free/Avail.
Space 0/0; EndEmpty/Avail. Pages 0/0. CPU 0.00s/0.00u sec.
NOTICE: Index pg_toast_503832 058_idx: Pages 1; Tuples 0. CPU 0.00s/0.00u
sec.
VACUUM
obs=>
Cheers,
Dave.

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Nov 12 '05 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
23626
by: Robert | last post by:
I am having performance issues on a SQL query in Access. My query is accessing and joining several tables (one very large one). The tables are linked ODBC. The client submits the query to the server, separated by several states. It appears the query is retrieving gigs of data from the table and processing the joins on the client. Is there away to perform more of the work on the server there by minimizing the amount of extraneous table...
4
10434
by: M Wells | last post by:
Hi All, I have a table that currently contains approx. 8 million records. I'm running a SELECT query against this table that in some circumstances is either very quick (ie results returned in Query Analyzer almost instantaneously), or very slow (ie 30 to 40 seconds to return results), and I'm trying to work out how I improve performance. Essentially the query I'm running is nothing more complex than:
3
2786
by: gizmo | last post by:
I have a stored procedure that queries a database using a Select statement with some inner joins and conditions. With over 9 million records it takes 1 min 36 sec to complete. This is too slow for my requirements. Is there any way I can optimize this query. I have thought about using an indexed view. I haven't done one before, does anyone know if this would have potential to improve performance or indeed any other performance...
1
1681
by: Gary Wales | last post by:
We have two main query types running against a table of some 2 million rows and have gotten query response down to well under a second by using the right indexes. Problem is that we are running an advertising campaign that brings a concentrated block of users to the site. When this happens one of the queries which relies on a particluar index comes severely of the rails and can take up to 2 minutes filling the slow query log for 15 to 20...
6
13139
by: pragile | last post by:
I have a stored procedure that has some problems with slow performance. The procedure has 2 parameters @firstname varchar(100) @lastname varchar(100) These parameters can have values like a%, that is wildcard searches. The strange thing is that if one of the parameters has the value %, and the other one a%, the performance is very bad. If i subsistute the variables with exactly the same values hardcoded in
2
2019
by: Yonatan Goraly | last post by:
I am in the process of adding PostgreSQL support for an application, in addition to Oracle and MS SQL. I am using PostgreSQL version 7.3.2, Red Hat 9.0 on Intel Pentium III board. I have a query that generally looks like this: SELECT t1.col1, t2.col1 FROM t1, t2 WHERE t1.x=t2.y AND t2.p='string' AND t2.q=1 This query is strikingly slow (about 100 sec when both t1 and t2 has
15
5657
by: Rolan | last post by:
There must be a way to enhance the performance of a query, or find a plausible workaround, but I seem to be hitting a wall. I have tried a few tweaks, however, there has been no improvement. Simply, I'm including one calcualtion from a separate table/query. It sums the total units sold to date by ProductID number and is used in other select queries to perform various calculations. Perhaps there is an advantage in working with a maximum...
0
1750
by: Andy_Khosravi | last post by:
I'm having a problem trying to optimize the performance of one of my A97 databases. I have very slow record navigation after a change I made to the table structure, and I'm not sure how best to correct it. For purposes of explanation, lets say I have two tables: tblIssues and tblLocationHistory. The tblIssues table contains 'incidents' along with the incident header information. The tblLocationHistory table contains a list of all...
29
5512
by: wizofaus | last post by:
I previously posted about a problem where it seemed that changing the case of the word "BY" in a SELECT query was causing it to run much much faster. Now I've hit the same thing again, where basically almost any change I make to how the query is executed (so that it still performs the same function) causes the performance to jump from a dismal 7 or 8 seconds to instantaneous. It's a very simple query of the form: SELECT Min(MyValue)...
2
9841
by: existential.philosophy | last post by:
This is a new problem for me: I have some queries that open very slowly in design view. My benchmark query takes about 20 minutes to open in design view. That same query takes about 20 minutes to open in datasheet view. As an experiment, I deleted all rows in all tables; after that, the query took only seconds to open in both design view and datasheet view. From these facts, I conclude that Access is evaluating the query when I go to...
0
9621
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
10264
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
1
10039
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
8937
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
7461
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
6716
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5484
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
4009
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
3610
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.