Join query on 1M row table slow - PostgreSQL Database

CSN

I have a pretty simple select query that joins a table
(p) with 125K rows with another table (pc) with almost
one million rows:

select p.*
from product_categor ies pc
inner join products p
on pc.product_id = p.id
where pc.category_id = $category_id
order by p.title
limit 25
offset $offset

The query usually takes about five seconds to execute
(all other PG queries perform fast enough). I have
indexes on everything needed, and EXPLAIN shows
they're being used. Is there anything else I can do to
improve performance - such as tweaking some settings
in the config?

Redhat 9, PG 7.4.1.

_______________ _______________ ____
Do you Yahoo!?
Yahoo! Finance: Get your refund fast by filing online.
http://taxes.yahoo.com/filing.html

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Nov 22 '05 #1

Subscribe Reply

6958

lists

On Tue, 10 Feb 2004, CSN wrote:

I have a pretty simple select query that joins a table
(p) with 125K rows with another table (pc) with almost
one million rows:

select p.*
from product_categor ies pc
inner join products p
on pc.product_id = p.id
where pc.category_id = $category_id
order by p.title
limit 25
offset $offset

The query usually takes about five seconds to execute
(all other PG queries perform fast enough). I have
indexes on everything needed, and EXPLAIN shows
they're being used. Is there anything else I can do to
improve performance - such as tweaking some settings
in the config?

Redhat 9, PG 7.4.1.

Could you give more info on the hardware?
You did not mention how often you do your vacuum analyze or how often data
is updated/deleted. The more info you provide the more we can try to
help.

How about your buffer and other settings?

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Nov 22 '05 #2

Richard Huxton

On Tuesday 10 February 2004 19:51, CSN wrote:

I have a pretty simple select query that joins a table
(p) with 125K rows with another table (pc) with almost
one million rows:

select p.*
from product_categor ies pc
inner join products p
on pc.product_id = p.id
where pc.category_id = $category_id
order by p.title
limit 25
offset $offset

The query usually takes about five seconds to execute
(all other PG queries perform fast enough). I have
indexes on everything needed, and EXPLAIN shows
they're being used. Is there anything else I can do to
improve performance - such as tweaking some settings
in the config?

Quite possibly - I'd suggest:
1. Read "Performanc e Tuning" and the "Annotated conf file" at
http://www.varlena.com/varlena/Gener...bits/index.php
2. Vacuum analyze the tables concerned and post the output of EXPLAIN ANALYSE
with your query.
3. Next time, try the -performance list - probably get a faster response
there.

--
Richard Huxton
Archonet Ltd

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Nov 22 '05 #3

CSN

> 2. Vacuum analyze the tables concerned and post the

output of EXPLAIN ANALYSE
with your query.

=> explain analyze;

results in:

ERROR: syntax error at or near ";" at character 16

_______________ _______________ ____
Do you Yahoo!?
Yahoo! Finance: Get your refund fast by filing online.
http://taxes.yahoo.com/filing.html

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Nov 22 '05 #4

scott.marlowe

On Tue, 10 Feb 2004, CSN wrote:

I have a pretty simple select query that joins a table
(p) with 125K rows with another table (pc) with almost
one million rows:

select p.*
from product_categor ies pc
inner join products p
on pc.product_id = p.id
where pc.category_id = $category_id
order by p.title
limit 25
offset $offset

The query usually takes about five seconds to execute
(all other PG queries perform fast enough). I have
indexes on everything needed, and EXPLAIN shows
they're being used. Is there anything else I can do to
improve performance - such as tweaking some settings
in the config?

The problem is that in order to do an offset / limit on such a set,
postgresql is gonna have to generate $offset + limit of the joined set.

So, it's quite likely that it's generating the whole set first.

It also looks odd having a select p.* from product_cat pc, but not
selecting anything from the pc table.

Could this be rewritten to something like

select p.* from products p where p.id in (select
product_id from product_categor ies pc where pc.category_id = $category_id)
order by p.title limit 25 offset $offset

? Or is that equivalent?
---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Nov 22 '05 #5

CSN

Here's the EXPLAIN:

Limit (cost=9595.99.. 9596.05 rows=25 width=290)
-> Sort (cost=9595.99.. 9598.45 rows=986
width=290)
Sort Key: p.title
-> Nested Loop (cost=0.00..954 6.96 rows=986
width=290)
-> Index Scan using idx_pc_category _id
on product_categor ies pc (cost=0.00..360 7.28 rows=986
width=4)
Index Cond: (category_id = 1016)
-> Index Scan using pkey_products_i d
on products p (cost=0.00..6.0 1 rows=1 width=290)
Index Cond: ("outer".produc t_id =
p.id)
(8 rows)
Is the "cost" high?

CSN
--- li***@natserv.c om wrote:

On Tue, 10 Feb 2004, CSN wrote:
I have a pretty simple select query that joins a

table
(p) with 125K rows with another table (pc) with

almost
one million rows:

select p.*
from product_categor ies pc
inner join products p
on pc.product_id = p.id
where pc.category_id = $category_id
order by p.title
limit 25
offset $offset

The query usually takes about five seconds to

execute
(all other PG queries perform fast enough). I have
indexes on everything needed, and EXPLAIN shows
they're being used. Is there anything else I can

do to
improve performance - such as tweaking some

settings
in the config?

Redhat 9, PG 7.4.1.

Could you give more info on the hardware?
You did not mention how often you do your vacuum
analyze or how often data
is updated/deleted. The more info you provide the
more we can try to
help.

How about your buffer and other settings?

Nov 22 '05 #6

scott.marlowe

On Tue, 10 Feb 2004, CSN wrote:

2. Vacuum analyze the tables concerned and post the
output of EXPLAIN ANALYSE
with your query.

=> explain analyze;

results in:

ERROR: syntax error at or near ";" at character 16

No silly. you do:

explain analyze select ... (rest of the query...)

and it runs the query and tells you how long each bit took and what it
THOUGHT it would get back in terms of number of rows and what it actually
got back.

Let us know...
---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Nov 22 '05 #7

CSN

--- "scott.marl owe" <sc***********@ ihs.com> wrote:

On Tue, 10 Feb 2004, CSN wrote:
I have a pretty simple select query that joins a

table
(p) with 125K rows with another table (pc) with

almost
one million rows:

select p.*
from product_categor ies pc
inner join products p
on pc.product_id = p.id
where pc.category_id = $category_id
order by p.title
limit 25
offset $offset

The query usually takes about five seconds to

execute
(all other PG queries perform fast enough). I have
indexes on everything needed, and EXPLAIN shows
they're being used. Is there anything else I can

do to
improve performance - such as tweaking some

settings
in the config?

The problem is that in order to do an offset / limit
on such a set,
postgresql is gonna have to generate $offset + limit
of the joined set.

So, it's quite likely that it's generating the whole
set first.

It also looks odd having a select p.* from
product_cat pc, but not
selecting anything from the pc table.

Could this be rewritten to something like

select p.* from products p where p.id in (select
product_id from product_categor ies pc where
pc.category_id = $category_id)
order by p.title limit 25 offset $offset

? Or is that equivalent?

I think that probably improves things (lower cost? -
see my other post):

explain select p.* from products p where p.id in (
select product_id from product_categor ies pc where
pc.category_id = 1016) order by p.title limit 25
offset 0;

QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
Limit (cost=4282.18.. 4282.24 rows=25 width=290)
-> Sort (cost=4282.18.. 4282.46 rows=111
width=290)
Sort Key: p.title
-> Nested Loop (cost=3609.75.. 4278.41
rows=111 width=290)
-> HashAggregate
(cost=3609.75.. 3609.75 rows=111 width=4)
-> Index Scan using
idx_pc_category _id on product_categor ies pc
(cost=0.00..360 7.28 rows=986 width=4)
Index Cond: (category_id =
1016)
-> Index Scan using pkey_products_i d
on products p (cost=0.00..6.0 1 rows=1 width=290)
Index Cond: (p.id =
"outer".product _id)
(9 rows)
I figured the limit/offset was probably causing the
problem. What's weird is that when the same query is
executed again, it seems much faster - some sort of
caching maybe?

(table pc is just product_id <=> category_id - I don't
really need the category_id)

CSN
_______________ _______________ ____
Do you Yahoo!?
Yahoo! Finance: Get your refund fast by filing online.
http://taxes.yahoo.com/filing.html

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Nov 22 '05 #8

scott.marlowe

On Tue, 10 Feb 2004, CSN wrote:

I think that probably improves things (lower cost? -
see my other post):

explain select p.* from products p where p.id in (
select product_id from product_categor ies pc where
pc.category_id = 1016) order by p.title limit 25
offset 0;

QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
Limit (cost=4282.18.. 4282.24 rows=25 width=290)
-> Sort (cost=4282.18.. 4282.46 rows=111
width=290)
Sort Key: p.title
-> Nested Loop (cost=3609.75.. 4278.41
rows=111 width=290)
-> HashAggregate
(cost=3609.75.. 3609.75 rows=111 width=4)
-> Index Scan using
idx_pc_category _id on product_categor ies pc
(cost=0.00..360 7.28 rows=986 width=4)
Index Cond: (category_id =
1016)
-> Index Scan using pkey_products_i d
on products p (cost=0.00..6.0 1 rows=1 width=290)
Index Cond: (p.id =
"outer".product _id)
(9 rows)
I figured the limit/offset was probably causing the
problem. What's weird is that when the same query is
executed again, it seems much faster - some sort of
caching maybe?>
Yep. Exactly.

What does explain analyze say about the two queries?
(table pc is just product_id <=> category_id - I don't
really need the category_id)

If you could eliminate the need for that table in this query you should
get it to run much faster.
---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Nov 22 '05 #9

CSN

Doh! Yeah, now I remember ;)

QUERY 1:

=> explain analyze select p.* from product_categor ies
pc inner join products p on pc.product_id = p.id where
pc.category_id = 1016 order by p.title limit 25 offset
0;

QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=9595.99.. 9596.05 rows=25 width=290)
(actual time=514.808..5 14.942 rows=25 loops=1)
-> Sort (cost=9595.99.. 9598.45 rows=986
width=290) (actual time=514.794..5 14.868 rows=25
loops=1)
Sort Key: p.title
-> Nested Loop (cost=0.00..954 6.96 rows=986
width=290) (actual time=0.672..421 .732 rows=2358
loops=1)
-> Index Scan using idx_pc_category _id
on product_categor ies pc (cost=0.00..360 7.28 rows=986
width=4) (actual time=0.343..125 .762 rows=2358
loops=1)
Index Cond: (category_id = 1016)
-> Index Scan using pkey_products_i d
on products p (cost=0.00..6.0 1 rows=1 width=290)
(actual time=0.075..0.0 83 rows=1 loops=2358)
Index Cond: ("outer".produc t_id =
p.id)
Total runtime: 516.174 ms
(9 rows)
QUERY 2:

=> explain analyze select p.* from products p where
p.id in ( select product_id from product_categor ies pc
where pc.category_id = 1016) order by p.title limit 25
offset 0;

QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=4282.18.. 4282.24 rows=25 width=290)
(actual time=447.852..4 47.979 rows=25 loops=1)
-> Sort (cost=4282.18.. 4282.46 rows=111
width=290) (actual time=447.836..4 47.904 rows=25
loops=1)
Sort Key: p.title
-> Nested Loop (cost=3609.75.. 4278.41
rows=111 width=290) (actual time=104.256..3 58.182
rows=2358 loops=1)
-> HashAggregate
(cost=3609.75.. 3609.75 rows=111 width=4) (actual
time=103.922..1 14.836 rows=2358 loops=1)
-> Index Scan using
idx_pc_category _id on product_categor ies pc
(cost=0.00..360 7.28 rows=986 width=4) (actual
time=0.401..92. 253 rows=2358 loops=1)
Index Cond: (category_id =
1016)
-> Index Scan using pkey_products_i d
on products p (cost=0.00..6.0 1 rows=1 width=290)
(actual time=0.069..0.0 76 rows=1 loops=2358)
Index Cond: (p.id =
"outer".product _id)
Total runtime: 449.370 ms
(10 rows)
-CSN
--- "scott.marl owe" <sc***********@ ihs.com> wrote:

On Tue, 10 Feb 2004, CSN wrote:
2. Vacuum analyze the tables concerned and post the output of EXPLAIN ANALYSE
with your query.

=> explain analyze;

results in:

ERROR: syntax error at or near ";" at character

16

No silly. you do:

explain analyze select ... (rest of the query...)

and it runs the query and tells you how long each
bit took and what it
THOUGHT it would get back in terms of number of rows
and what it actually
got back.

Let us know...

_______________ _______________ ____
Do you Yahoo!?
Yahoo! Finance: Get your refund fast by filing online.
http://taxes.yahoo.com/filing.html

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ma*******@postg resql.org so that your
message can get through to the mailing list cleanly

Nov 22 '05 #10

Similar topics

5343

Query optimisation question - two queries, one slow, one fast

by: Tim Fountain | last post by:

We've recently enabled slow query logging on a server and it's proving interesting seeing which queries are bogging things down. This one is puzzling me a little: SELECT articleid, type, authorid, authorname, text, posted FROM comments WHERE status = 'normal' ORDER BY posted DESC LIMIT 5; The purpose of this query is to list the five most recent (non-deleted) comments. Here is the table structure:

MySQL Database

2445

LEFT JOIN limitations ??

by: Marek Lewczuk | last post by:

Hello, I have a strange problem, maybe some of you will be able to explain me something. I use LEFT JOIN as a substitute for subselects. It's true that many subselects can be rewriten using LEFT JOIN. I have made a query which use LEFT JOIN statement and... when there are many LEFT JOIN's (over 3) on the same table MySQL execute this query very long time... few hours or more. Maybe there is something wrong with my table structures... Are...

MySQL Database

3062

AW: How many join do I need for a query on 3 table?

by: B. Fongo | last post by:

I learned MySQL last year without putting it into action; that is why I face trouble in formulating my queries. Were it a test, then you would have passed it, because your queries did help me solve my problem. I'll turn to MySQL doc after getting through this pressing project. Thanks a lot Roger! Babale -----Urspr=FCngliche Nachricht-----

MySQL Database

3346

Inner Join / Indexes Hell

by: Ike | last post by:

Oh I have a nasty query which runs incredibly slowly. I am running MySQL 4.0.20-standard. Thus, in trying to expedite the query, I am trying to set indexes in my tables. My query requires four inner joins, as follows : SELECT DISTINCT upcards.id,statuskey.status,upcards.firstname,upcards.lastname,originkey.ori gin,associatekey.username,associatekey2.username,upcards.deleted FROM upcards,status,origins,associates INNER JOIN status...

MySQL Database

2421

View behaviour in join

by: Marcelo Noga | last post by:

Hi Everybody, I have a complex view, that includes a "group by" clause. I'm trying to join this view with a table, in a very simple query. The problem is that the optimizer is not using the table data as input for the view (I expect this because I have arguments for the table, but not for the view), but executing the view in a different step and then joining to the table by a merge/hash join. This is obviously very slow.

Microsoft SQL Server

1294

Query help needed (JOIN)

by: Adam Nemitoff | last post by:

I have two tables populated during the use of an application to log user events and application states. They are named "EventTable" and "StateTable" and the structures follow: EventTable: ID EventTimeStep EventID -- ------------- --------- 1 5 E1 2 22 E2

Microsoft SQL Server

9327

Can query with left join be made editable?

by: deko | last post by:

SELECT tblTxAcct.TxAcctName, tblTxType.TxTypeName, Nz(qryTxAcctListCt.TxCount, 0) AS TxCt FROM (tblTxAcct INNER JOIN tblTxType ON tblTxAcct.TxType_ID=tblTxType.TxType_ID) LEFT JOIN qryTxAcctListCt ON tblTxAcct.TxAcct_ID=qryTxAcctListCt.TxAcct_ID; I use this query as a RecordSource for a Datasheet that displays Accounts, Types, and the number of Transactions in each Account. An Account cannot exist outside of a Type, but an Account...

Microsoft Access / VBA

9304

SQL statement on inner join that hits another table twice?

by: dmonroe | last post by:

hi group -- Im having a nested inner join problem with an Access SQl statement/Query design. Im running the query from ASP and not usng the access interface at all. Here's the tables: tblEmployees empId -- EmpName -- EmpRole -- EmpManager -------....------------.... ---------....--------------- 1........ dan yella..........1..........2

Microsoft Access / VBA

1965

join table based on Time comparison

by: tdes42 | last post by:

I believe I am looking for some form of Join Query, but my understanding of Access logic and my logic do not yet click entirelyâ€¦. I have a table of ocean buoy data, taken every hour over many days (hopefully a whole year if it doesnâ€™t slow things down too much): tblBuoy Date Time Swell Direction Swell Period Swell Height 5/6/2007 15:00 195 14.2 2.568

Microsoft Access / VBA

8348

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...

Windows Server

8863

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...

C / C++

8549

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...

Windows Server

8636

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...

General

6187

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...

Microsoft Access / VBA

5660

Couldn’t get equations in html when convert word .docx file to html file in C#.

by: conductexam | last post by:

I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...

C# / C Sharp

4186

Trying to create a lan-to-lan vpn between two differents networks

by: TSSRALBI | last post by:

Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...

Networking - Hardware / Configuration

2765

transfer the data from one system to another through ip address

by: 6302768590 | last post by:

Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system

C# / C Sharp

2004

How to add payments to a PHP MySQL app.

by: muto222 | last post by:

How can i add a mobile payment intergratation into php mysql website.

PHP