473,320 Members | 1,982 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,320 software developers and data experts.

Join query on 1M row table slow

CSN
I have a pretty simple select query that joins a table
(p) with 125K rows with another table (pc) with almost
one million rows:

select p.*
from product_categories pc
inner join products p
on pc.product_id = p.id
where pc.category_id = $category_id
order by p.title
limit 25
offset $offset

The query usually takes about five seconds to execute
(all other PG queries perform fast enough). I have
indexes on everything needed, and EXPLAIN shows
they're being used. Is there anything else I can do to
improve performance - such as tweaking some settings
in the config?

Redhat 9, PG 7.4.1.

__________________________________
Do you Yahoo!?
Yahoo! Finance: Get your refund fast by filing online.
http://taxes.yahoo.com/filing.html

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Nov 22 '05 #1
21 6918
On Tue, 10 Feb 2004, CSN wrote:
I have a pretty simple select query that joins a table
(p) with 125K rows with another table (pc) with almost
one million rows:

select p.*
from product_categories pc
inner join products p
on pc.product_id = p.id
where pc.category_id = $category_id
order by p.title
limit 25
offset $offset

The query usually takes about five seconds to execute
(all other PG queries perform fast enough). I have
indexes on everything needed, and EXPLAIN shows
they're being used. Is there anything else I can do to
improve performance - such as tweaking some settings
in the config?

Redhat 9, PG 7.4.1.


Could you give more info on the hardware?
You did not mention how often you do your vacuum analyze or how often data
is updated/deleted. The more info you provide the more we can try to
help.

How about your buffer and other settings?

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Nov 22 '05 #2
On Tuesday 10 February 2004 19:51, CSN wrote:
I have a pretty simple select query that joins a table
(p) with 125K rows with another table (pc) with almost
one million rows:

select p.*
from product_categories pc
inner join products p
on pc.product_id = p.id
where pc.category_id = $category_id
order by p.title
limit 25
offset $offset

The query usually takes about five seconds to execute
(all other PG queries perform fast enough). I have
indexes on everything needed, and EXPLAIN shows
they're being used. Is there anything else I can do to
improve performance - such as tweaking some settings
in the config?


Quite possibly - I'd suggest:
1. Read "Performance Tuning" and the "Annotated conf file" at
http://www.varlena.com/varlena/Gener...bits/index.php
2. Vacuum analyze the tables concerned and post the output of EXPLAIN ANALYSE
with your query.
3. Next time, try the -performance list - probably get a faster response
there.

--
Richard Huxton
Archonet Ltd

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Nov 22 '05 #3
CSN
> 2. Vacuum analyze the tables concerned and post the
output of EXPLAIN ANALYSE
with your query.


=> explain analyze;

results in:

ERROR: syntax error at or near ";" at character 16

__________________________________
Do you Yahoo!?
Yahoo! Finance: Get your refund fast by filing online.
http://taxes.yahoo.com/filing.html

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Nov 22 '05 #4
On Tue, 10 Feb 2004, CSN wrote:
I have a pretty simple select query that joins a table
(p) with 125K rows with another table (pc) with almost
one million rows:

select p.*
from product_categories pc
inner join products p
on pc.product_id = p.id
where pc.category_id = $category_id
order by p.title
limit 25
offset $offset

The query usually takes about five seconds to execute
(all other PG queries perform fast enough). I have
indexes on everything needed, and EXPLAIN shows
they're being used. Is there anything else I can do to
improve performance - such as tweaking some settings
in the config?


The problem is that in order to do an offset / limit on such a set,
postgresql is gonna have to generate $offset + limit of the joined set.

So, it's quite likely that it's generating the whole set first.

It also looks odd having a select p.* from product_cat pc, but not
selecting anything from the pc table.

Could this be rewritten to something like

select p.* from products p where p.id in (select
product_id from product_categories pc where pc.category_id = $category_id)
order by p.title limit 25 offset $offset

? Or is that equivalent?
---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Nov 22 '05 #5
CSN

Here's the EXPLAIN:

Limit (cost=9595.99..9596.05 rows=25 width=290)
-> Sort (cost=9595.99..9598.45 rows=986
width=290)
Sort Key: p.title
-> Nested Loop (cost=0.00..9546.96 rows=986
width=290)
-> Index Scan using idx_pc_category_id
on product_categories pc (cost=0.00..3607.28 rows=986
width=4)
Index Cond: (category_id = 1016)
-> Index Scan using pkey_products_id
on products p (cost=0.00..6.01 rows=1 width=290)
Index Cond: ("outer".product_id =
p.id)
(8 rows)
Is the "cost" high?

CSN
--- li***@natserv.com wrote:
On Tue, 10 Feb 2004, CSN wrote:
I have a pretty simple select query that joins a

table
(p) with 125K rows with another table (pc) with

almost
one million rows:

select p.*
from product_categories pc
inner join products p
on pc.product_id = p.id
where pc.category_id = $category_id
order by p.title
limit 25
offset $offset

The query usually takes about five seconds to

execute
(all other PG queries perform fast enough). I have
indexes on everything needed, and EXPLAIN shows
they're being used. Is there anything else I can

do to
improve performance - such as tweaking some

settings
in the config?

Redhat 9, PG 7.4.1.


Could you give more info on the hardware?
You did not mention how often you do your vacuum
analyze or how often data
is updated/deleted. The more info you provide the
more we can try to
help.

How about your buffer and other settings?

__________________________________
Do you Yahoo!?
Yahoo! Finance: Get your refund fast by filing online.
http://taxes.yahoo.com/filing.html

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Nov 22 '05 #6
On Tue, 10 Feb 2004, CSN wrote:
2. Vacuum analyze the tables concerned and post the
output of EXPLAIN ANALYSE
with your query.


=> explain analyze;

results in:

ERROR: syntax error at or near ";" at character 16


No silly. you do:

explain analyze select ... (rest of the query...)

and it runs the query and tells you how long each bit took and what it
THOUGHT it would get back in terms of number of rows and what it actually
got back.

Let us know...
---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Nov 22 '05 #7
CSN

--- "scott.marlowe" <sc***********@ihs.com> wrote:
On Tue, 10 Feb 2004, CSN wrote:
I have a pretty simple select query that joins a

table
(p) with 125K rows with another table (pc) with

almost
one million rows:

select p.*
from product_categories pc
inner join products p
on pc.product_id = p.id
where pc.category_id = $category_id
order by p.title
limit 25
offset $offset

The query usually takes about five seconds to

execute
(all other PG queries perform fast enough). I have
indexes on everything needed, and EXPLAIN shows
they're being used. Is there anything else I can

do to
improve performance - such as tweaking some

settings
in the config?


The problem is that in order to do an offset / limit
on such a set,
postgresql is gonna have to generate $offset + limit
of the joined set.

So, it's quite likely that it's generating the whole
set first.

It also looks odd having a select p.* from
product_cat pc, but not
selecting anything from the pc table.

Could this be rewritten to something like

select p.* from products p where p.id in (select
product_id from product_categories pc where
pc.category_id = $category_id)
order by p.title limit 25 offset $offset

? Or is that equivalent?


I think that probably improves things (lower cost? -
see my other post):

explain select p.* from products p where p.id in (
select product_id from product_categories pc where
pc.category_id = 1016) order by p.title limit 25
offset 0;

QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
Limit (cost=4282.18..4282.24 rows=25 width=290)
-> Sort (cost=4282.18..4282.46 rows=111
width=290)
Sort Key: p.title
-> Nested Loop (cost=3609.75..4278.41
rows=111 width=290)
-> HashAggregate
(cost=3609.75..3609.75 rows=111 width=4)
-> Index Scan using
idx_pc_category_id on product_categories pc
(cost=0.00..3607.28 rows=986 width=4)
Index Cond: (category_id =
1016)
-> Index Scan using pkey_products_id
on products p (cost=0.00..6.01 rows=1 width=290)
Index Cond: (p.id =
"outer".product_id)
(9 rows)
I figured the limit/offset was probably causing the
problem. What's weird is that when the same query is
executed again, it seems much faster - some sort of
caching maybe?

(table pc is just product_id <=> category_id - I don't
really need the category_id)

CSN
__________________________________
Do you Yahoo!?
Yahoo! Finance: Get your refund fast by filing online.
http://taxes.yahoo.com/filing.html

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Nov 22 '05 #8
On Tue, 10 Feb 2004, CSN wrote:

I think that probably improves things (lower cost? -
see my other post):

explain select p.* from products p where p.id in (
select product_id from product_categories pc where
pc.category_id = 1016) order by p.title limit 25
offset 0;

QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
Limit (cost=4282.18..4282.24 rows=25 width=290)
-> Sort (cost=4282.18..4282.46 rows=111
width=290)
Sort Key: p.title
-> Nested Loop (cost=3609.75..4278.41
rows=111 width=290)
-> HashAggregate
(cost=3609.75..3609.75 rows=111 width=4)
-> Index Scan using
idx_pc_category_id on product_categories pc
(cost=0.00..3607.28 rows=986 width=4)
Index Cond: (category_id =
1016)
-> Index Scan using pkey_products_id
on products p (cost=0.00..6.01 rows=1 width=290)
Index Cond: (p.id =
"outer".product_id)
(9 rows)
I figured the limit/offset was probably causing the
problem. What's weird is that when the same query is
executed again, it seems much faster - some sort of
caching maybe?>
Yep. Exactly.

What does explain analyze say about the two queries?
(table pc is just product_id <=> category_id - I don't
really need the category_id)


If you could eliminate the need for that table in this query you should
get it to run much faster.
---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Nov 22 '05 #9
CSN

Doh! Yeah, now I remember ;)

QUERY 1:

=> explain analyze select p.* from product_categories
pc inner join products p on pc.product_id = p.id where
pc.category_id = 1016 order by p.title limit 25 offset
0;

QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=9595.99..9596.05 rows=25 width=290)
(actual time=514.808..514.942 rows=25 loops=1)
-> Sort (cost=9595.99..9598.45 rows=986
width=290) (actual time=514.794..514.868 rows=25
loops=1)
Sort Key: p.title
-> Nested Loop (cost=0.00..9546.96 rows=986
width=290) (actual time=0.672..421.732 rows=2358
loops=1)
-> Index Scan using idx_pc_category_id
on product_categories pc (cost=0.00..3607.28 rows=986
width=4) (actual time=0.343..125.762 rows=2358
loops=1)
Index Cond: (category_id = 1016)
-> Index Scan using pkey_products_id
on products p (cost=0.00..6.01 rows=1 width=290)
(actual time=0.075..0.083 rows=1 loops=2358)
Index Cond: ("outer".product_id =
p.id)
Total runtime: 516.174 ms
(9 rows)
QUERY 2:

=> explain analyze select p.* from products p where
p.id in ( select product_id from product_categories pc
where pc.category_id = 1016) order by p.title limit 25
offset 0;

QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=4282.18..4282.24 rows=25 width=290)
(actual time=447.852..447.979 rows=25 loops=1)
-> Sort (cost=4282.18..4282.46 rows=111
width=290) (actual time=447.836..447.904 rows=25
loops=1)
Sort Key: p.title
-> Nested Loop (cost=3609.75..4278.41
rows=111 width=290) (actual time=104.256..358.182
rows=2358 loops=1)
-> HashAggregate
(cost=3609.75..3609.75 rows=111 width=4) (actual
time=103.922..114.836 rows=2358 loops=1)
-> Index Scan using
idx_pc_category_id on product_categories pc
(cost=0.00..3607.28 rows=986 width=4) (actual
time=0.401..92.253 rows=2358 loops=1)
Index Cond: (category_id =
1016)
-> Index Scan using pkey_products_id
on products p (cost=0.00..6.01 rows=1 width=290)
(actual time=0.069..0.076 rows=1 loops=2358)
Index Cond: (p.id =
"outer".product_id)
Total runtime: 449.370 ms
(10 rows)
-CSN
--- "scott.marlowe" <sc***********@ihs.com> wrote:
On Tue, 10 Feb 2004, CSN wrote:
2. Vacuum analyze the tables concerned and post the output of EXPLAIN ANALYSE
with your query.


=> explain analyze;

results in:

ERROR: syntax error at or near ";" at character

16

No silly. you do:

explain analyze select ... (rest of the query...)

and it runs the query and tells you how long each
bit took and what it
THOUGHT it would get back in terms of number of rows
and what it actually
got back.

Let us know...

__________________________________
Do you Yahoo!?
Yahoo! Finance: Get your refund fast by filing online.
http://taxes.yahoo.com/filing.html

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ma*******@postgresql.org so that your
message can get through to the mailing list cleanly

Nov 22 '05 #10
CSN

--- li***@natserv.com wrote:
On Tue, 10 Feb 2004, CSN wrote:
I have a pretty simple select query that joins a table
(p) with 125K rows with another table (pc) with

almost
one million rows:

select p.*
from product_categories pc
inner join products p
on pc.product_id = p.id
where pc.category_id = $category_id
order by p.title
limit 25
offset $offset

The query usually takes about five seconds to

execute
(all other PG queries perform fast enough). I have
indexes on everything needed, and EXPLAIN shows
they're being used. Is there anything else I can

do to
improve performance - such as tweaking some

settings
in the config?

Redhat 9, PG 7.4.1.


Could you give more info on the hardware?


Intel(R) Celeron(R) CPU 1.70GHz
1 GB RAM
You did not mention how often you do your vacuum
analyze or how often data
is updated/deleted.
I've done both vaccuum and vaccuum analyze on the
database. Vaccuum full takes forever (I haven't let it
run its full course yet). The data is completely
static - no updates/deletes, just selects.
How about your buffer and other settings?
shared_buffers = 1000

That's about the only setting I changed in
postgresql.conf

TIA,
CSN

The more info you provide the more we can try to
help.

How about your buffer and other settings?

__________________________________
Do you Yahoo!?
Yahoo! Finance: Get your refund fast by filing online.
http://taxes.yahoo.com/filing.html

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Nov 22 '05 #11
CSN

--- "scott.marlowe" <sc***********@ihs.com> wrote:
What's weird is that when the same query

is
executed again, it seems much faster - some sort

of
caching maybe?>


Yep. Exactly.


Really? Where can I RTFM about it?

CSN
__________________________________
Do you Yahoo!?
Yahoo! Finance: Get your refund fast by filing online.
http://taxes.yahoo.com/filing.html

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Nov 22 '05 #12
Well, it looks like the number of rows estimate for the nested loop in the
first query and the hash agg in the second are off by a factor 3 for the
first query, and a factor of 20 for the second. Try running number 1 with
set enable_nestloop = off
and see if the first one gets faster.

You might also wanna try turning off hash aggregate on the second one and
see how that works.

upping the analyze target on those two tables may help a bit too.

On Tue, 10 Feb 2004, CSN wrote:

Doh! Yeah, now I remember ;)

QUERY 1:

=> explain analyze select p.* from product_categories
pc inner join products p on pc.product_id = p.id where
pc.category_id = 1016 order by p.title limit 25 offset
0;

QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=9595.99..9596.05 rows=25 width=290)
(actual time=514.808..514.942 rows=25 loops=1)
-> Sort (cost=9595.99..9598.45 rows=986
width=290) (actual time=514.794..514.868 rows=25
loops=1)
Sort Key: p.title
-> Nested Loop (cost=0.00..9546.96 rows=986
width=290) (actual time=0.672..421.732 rows=2358
loops=1)
-> Index Scan using idx_pc_category_id
on product_categories pc (cost=0.00..3607.28 rows=986
width=4) (actual time=0.343..125.762 rows=2358
loops=1)
Index Cond: (category_id = 1016)
-> Index Scan using pkey_products_id
on products p (cost=0.00..6.01 rows=1 width=290)
(actual time=0.075..0.083 rows=1 loops=2358)
Index Cond: ("outer".product_id =
p.id)
Total runtime: 516.174 ms
(9 rows)
QUERY 2:

=> explain analyze select p.* from products p where
p.id in ( select product_id from product_categories pc
where pc.category_id = 1016) order by p.title limit 25
offset 0;

QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=4282.18..4282.24 rows=25 width=290)
(actual time=447.852..447.979 rows=25 loops=1)
-> Sort (cost=4282.18..4282.46 rows=111
width=290) (actual time=447.836..447.904 rows=25
loops=1)
Sort Key: p.title
-> Nested Loop (cost=3609.75..4278.41
rows=111 width=290) (actual time=104.256..358.182
rows=2358 loops=1)
-> HashAggregate
(cost=3609.75..3609.75 rows=111 width=4) (actual
time=103.922..114.836 rows=2358 loops=1)
-> Index Scan using
idx_pc_category_id on product_categories pc
(cost=0.00..3607.28 rows=986 width=4) (actual
time=0.401..92.253 rows=2358 loops=1)
Index Cond: (category_id =
1016)
-> Index Scan using pkey_products_id
on products p (cost=0.00..6.01 rows=1 width=290)
(actual time=0.069..0.076 rows=1 loops=2358)
Index Cond: (p.id =
"outer".product_id)
Total runtime: 449.370 ms
(10 rows)
-CSN
--- "scott.marlowe" <sc***********@ihs.com> wrote:
On Tue, 10 Feb 2004, CSN wrote:
> 2. Vacuum analyze the tables concerned and post

the
> output of EXPLAIN ANALYSE
> with your query.

=> explain analyze;

results in:

ERROR: syntax error at or near ";" at character

16

No silly. you do:

explain analyze select ... (rest of the query...)

and it runs the query and tells you how long each
bit took and what it
THOUGHT it would get back in terms of number of rows
and what it actually
got back.

Let us know...

__________________________________
Do you Yahoo!?
Yahoo! Finance: Get your refund fast by filing online.
http://taxes.yahoo.com/filing.html

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Nov 22 '05 #13
On Tue, 10 Feb 2004, CSN wrote:

--- "scott.marlowe" <sc***********@ihs.com> wrote:
What's weird is that when the same query

is
executed again, it seems much faster - some sort

of
caching maybe?>


Yep. Exactly.


Really? Where can I RTFM about it?


Not sure. Basically, your kernel should be caching a fair bit. See what
top says about cache and buffer size. If you've got a big enough kernel
buffer, and the table gets hit often enough, it should stay in kernel
memory.
---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Nov 22 '05 #14
CSN

I disabled enable_hashagg and enable_nestloop. Appears
to have made both queries worse :(

QUERY 1:

=> explain analyze select p.* from product_categories
pc inner join products p on pc.product_id = p.id where
pc.category_id = 1016 order by p.title limit 25 offset
10000;

QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=65999.78..65999.78 rows=1 width=290)
(actual time=7736.029..7736.029 rows=0 loops=1)
-> Sort (cost=65997.31..65999.78 rows=986
width=290) (actual time=7723.794..7730.352 rows=2358
loops=1)
Sort Key: p.title
-> Merge Join (cost=65306.35..65948.28
rows=986 width=290) (actual time=7028.790..7614.223
rows=2358 loops=1)
Merge Cond: ("outer".product_id =
"inner".id)
-> Sort (cost=3656.31..3658.78
rows=986 width=4) (actual time=102.115..105.357
rows=2358 loops=1)
Sort Key: pc.product_id
-> Index Scan using
idx_pc_category_id on product_categories pc
(cost=0.00..3607.28 rows=986 width=4) (actual
time=0.349..94.173 rows=2358 loops=1)
Index Cond: (category_id =
1016)
-> Sort (cost=61650.04..61963.62
rows=125430 width=290) (actual time=6926.394..7272.130
rows=124521 loops=1)
Sort Key: p.id
-> Seq Scan on products p
(cost=0.00..6638.30 rows=125430 width=290) (actual
time=0.102..2855.358 rows=124753 loops=1)
Total runtime: 8003.067 ms
(13 rows)
QUERY 2:

=> explain analyze select p.* from products p where
p.id in ( select product_id from product_categories pc
where pc.category_id = 1016) order by p.title limit 25
offset 0;

QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=10931.85..10931.91 rows=25 width=290)
(actual time=3667.396..3667.526 rows=25 loops=1)
-> Sort (cost=10931.85..10932.13 rows=111
width=290) (actual time=3667.384..3667.453 rows=25
loops=1)
Sort Key: p.title
-> Hash Join (cost=3661.52..10928.08
rows=111 width=290) (actual time=111.198..1615.324
rows=2358 loops=1)
Hash Cond: ("outer".id =
"inner".product_id)
-> Seq Scan on products p
(cost=0.00..6638.30 rows=125430 width=290) (actual
time=0.113..1039.900 rows=124753 loops=1)
-> Hash (cost=3661.24..3661.24
rows=111 width=4) (actual time=110.932..110.932 rows=0
loops=1)
-> Unique
(cost=3656.31..3661.24 rows=111 width=4) (actual
time=97.255..106.798 rows=2358 loops=1)
-> Sort
(cost=3656.31..3658.78 rows=986 width=4) (actual
time=97.247..99.998 rows=2358 loops=1)
Sort Key:
pc.product_id
-> Index Scan using
idx_pc_category_id on product_categories pc
(cost=0.00..3607.28 rows=986 width=4) (actual
time=0.327..88.436 rows=2358 loops=1)
Index Cond:
(category_id = 1016)
Total runtime: 3669.479 ms
(13 rows)

upping the analyze target on those two tables may
help a bit too.
How exactly do I do that?

SELECT * from thanks limit 1000
;)

CSN

--- "scott.marlowe" <sc***********@ihs.com> wrote: Well, it looks like the number of rows estimate for
the nested loop in the
first query and the hash agg in the second are off
by a factor 3 for the
first query, and a factor of 20 for the second. Try
running number 1 with
set enable_nestloop = off
and see if the first one gets faster.

You might also wanna try turning off hash aggregate
on the second one and
see how that works.

upping the analyze target on those two tables may
help a bit too.

On Tue, 10 Feb 2004, CSN wrote:

Doh! Yeah, now I remember ;)

QUERY 1:

=> explain analyze select p.* from

product_categories
pc inner join products p on pc.product_id = p.id

where
pc.category_id = 1016 order by p.title limit 25

offset
0;

QUERY PLAN


----------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=9595.99..9596.05 rows=25 width=290)
(actual time=514.808..514.942 rows=25 loops=1)
-> Sort (cost=9595.99..9598.45 rows=986
width=290) (actual time=514.794..514.868 rows=25
loops=1)
Sort Key: p.title
-> Nested Loop (cost=0.00..9546.96

rows=986
width=290) (actual time=0.672..421.732 rows=2358
loops=1)
-> Index Scan using

idx_pc_category_id
on product_categories pc (cost=0.00..3607.28

rows=986
width=4) (actual time=0.343..125.762 rows=2358
loops=1)
Index Cond: (category_id =

1016)
-> Index Scan using

pkey_products_id
on products p (cost=0.00..6.01 rows=1 width=290)
(actual time=0.075..0.083 rows=1 loops=2358)
Index Cond:

("outer".product_id =
p.id)
Total runtime: 516.174 ms
(9 rows)
QUERY 2:

=> explain analyze select p.* from products p

where
p.id in ( select product_id from

product_categories pc
where pc.category_id = 1016) order by p.title

limit 25
offset 0;

QUERY PLAN


---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=4282.18..4282.24 rows=25 width=290)
(actual time=447.852..447.979 rows=25 loops=1)
-> Sort (cost=4282.18..4282.46 rows=111
width=290) (actual time=447.836..447.904 rows=25
loops=1)
Sort Key: p.title
-> Nested Loop (cost=3609.75..4278.41
rows=111 width=290) (actual time=104.256..358.182
rows=2358 loops=1)
-> HashAggregate
(cost=3609.75..3609.75 rows=111 width=4) (actual
time=103.922..114.836 rows=2358 loops=1)
-> Index Scan using
idx_pc_category_id on product_categories pc
(cost=0.00..3607.28 rows=986 width=4) (actual
time=0.401..92.253 rows=2358 loops=1)
Index Cond:

(category_id =
1016)
-> Index Scan using

pkey_products_id
on products p (cost=0.00..6.01 rows=1 width=290)
(actual time=0.069..0.076 rows=1 loops=2358)
Index Cond: (p.id =
"outer".product_id)
Total runtime: 449.370 ms
(10 rows)
-CSN
--- "scott.marlowe" <sc***********@ihs.com> wrote:
On Tue, 10 Feb 2004, CSN wrote:

> > 2. Vacuum analyze the tables concerned and post the
> > output of EXPLAIN ANALYSE
> > with your query.
>
> => explain analyze;
>
> results in:
>
> ERROR: syntax error at or near ";" at character 16

No silly. you do:

explain analyze select ... (rest of the query...)
and it runs the query and tells you how long each bit took and what it
THOUGHT it would get back in terms of number of rows and what it actually
got back.

Let us know...

__________________________________
Do you Yahoo!?
Yahoo! Finance: Get your refund fast by filing

online.
http://taxes.yahoo.com/filing.html

__________________________________
Do you Yahoo!?
Yahoo! Finance: Get your refund fast by filing online.
http://taxes.yahoo.com/filing.html

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Nov 22 '05 #15
On Tue, 10 Feb 2004, CSN wrote:

I disabled enable_hashagg and enable_nestloop. Appears
to have made both queries worse :(

Good, then we know that the nest loop and hash agg are probably good
plans.

How exactly do I do that?

SELECT * from thanks limit 1000
;)


it's an alter table thingie:

alter table tablename alter column columnname set statistics 100;

But since it looks like it's picking a good plan, it's probably not a real
big deal.

So, can you get rid of the join / in on the other table, or do you need
it there?
---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Nov 22 '05 #16
CSN
> So, can you get rid of the join / in on the other
table, or do you need it there?


No - each product can fall under multiple categories,
so I need the product_categories table (and the join
on it).

Thanks again,
CSN

__________________________________
Do you Yahoo!?
Yahoo! Finance: Get your refund fast by filing online.
http://taxes.yahoo.com/filing.html

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Nov 22 '05 #17
On Tue, 10 Feb 2004, CSN wrote:
So, can you get rid of the join / in on the other
table, or do you need it there?


No - each product can fall under multiple categories,
so I need the product_categories table (and the join
on it).


Oh, ok. then we might have as efficient a query as we're gonna get.

Oh, another thing is to make your select in the in() clause a select
distinct and see if that helps. If you've got a whole bunch of duplicates
running around in it it's sometimes faster to distinct it. Then again,
given all the work Tom's been putting in the query planner / optimization,
he may have already done something like that.
---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to ma*******@postgresql.org

Nov 22 '05 #18
"scott.marlowe" <sc***********@ihs.com> writes:
Oh, another thing is to make your select in the in() clause a select
distinct and see if that helps. If you've got a whole bunch of duplicates
running around in it it's sometimes faster to distinct it. Then again,
given all the work Tom's been putting in the query planner / optimization,
he may have already done something like that.


As of CVS tip I would not recommend inserting DISTINCT in an "IN
(subselect)" construct --- the planner will do the equivalent for itself
if it figures that's the best way. I do not recall this late at night
whether all the relevant changes were in 7.4 or not ...

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to ma*******@postgresql.org

Nov 22 '05 #19
I have found in previous versions of postgres that rewriting the join can help. Have you tried something like:

select p.*
from product_categories pc, products p
where pc.product_id = p.id AND pc.category_id = $category_id
order by p.title
limit 25
offset $offset
cheers
Matthew
--

Matthew Lunnon
Senior Software Engineer
RWA Ltd
www.rwa-net.co.uk

----- Original Message -----
From: CSN
To: pg***********@postgresql.org
Sent: Tuesday, February 10, 2004 7:51 PM
Subject: [GENERAL] Join query on 1M row table slow
I have a pretty simple select query that joins a table
(p) with 125K rows with another table (pc) with almost
one million rows:

select p.*
from product_categories pc
inner join products p
on pc.product_id = p.id
where pc.category_id = $category_id
order by p.title
limit 25
offset $offset

The query usually takes about five seconds to execute
(all other PG queries perform fast enough). I have
indexes on everything needed, and EXPLAIN shows
they're being used. Is there anything else I can do to
improve performance - such as tweaking some settings
in the config?

Redhat 9, PG 7.4.1.

__________________________________
Do you Yahoo!?
Yahoo! Finance: Get your refund fast by filing online.
http://taxes.yahoo.com/filing.html

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

__________________________________________________ ___________________
This e-mail has been scanned for viruses by MCI's Internet Managed Scanning Services - powered by MessageLabs. For further information visit http://www.mci.com

Nov 22 '05 #20
On Tue, 10 Feb 2004, CSN wrote:

I have a pretty simple select query that joins a table
(p) with 125K rows with another table (pc) with almost
one million rows:

select p.*
from product_categories pc
inner join products p
on pc.product_id = p.id
where pc.category_id = $category_id
order by p.title
limit 25
offset $offset


This idiom looks to me a lot like "results paging". You have a query
that returns a lot of rows, and you are formatting them one page at a
time in your CGI or whatever.

In PostgreSQL, cursors do this very well:

BEGIN;
DECLARE resultset CURSOR FOR
select p.* from product_categories pc
inner join products p on pc.product_id = p.id
where pc.category_id = $category_id
order by p.title ;

MOVE $offset IN resultset;
FETCH 25 FROM resultset;
[ repeat as necessary ];

This does use some resources on the server side, but it is very much
faster than LIMIT/OFFSET.

The biggest "gotcha" about cursors is that their lifetime is limited to
the enclosing transaction, so they may not be appropriate for CGI-type
applications.

Bill Gribble

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to ma*******@postgresql.org)

Nov 22 '05 #21
CSN

Appears to be somewhat slower - took about 600-2600ms
on different runs.

CSN
=> explain analyze select p.* from product_categories
pc, products p where pc.product_id = p.id AND
pc.category_id = 1016 order by p.title limit 25 offset
0;

QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=9270.77..9270.83 rows=25 width=290)
(actual time=2598.686..2598.875 rows=25 loops=1)
-> Sort (cost=9270.77..9273.15 rows=952
width=290) (actual time=2598.677..2598.805 rows=25
loops=1)
Sort Key: p.title
-> Nested Loop (cost=0.00..9223.67 rows=952
width=290) (actual time=27.257..2485.644 rows=2358
loops=1)
-> Index Scan using idx_pc_category_id
on product_categories pc (cost=0.00..3493.30 rows=951
width=4) (actual time=26.819..396.049 rows=2358
loops=1)
Index Cond: (category_id = 1016)
-> Index Scan using pkey_products_id
on products p (cost=0.00..6.01 rows=1 width=290)
(actual time=0.838..0.845 rows=1 loops=2358)
Index Cond: ("outer".product_id =
p.id)
Total runtime: 2600.395 ms
(9 rows)
--- Matthew Lunnon <ml*****@rwa-net.co.uk> wrote:
I have found in previous versions of postgres that
rewriting the join can help. Have you tried
something like:

select p.*
from product_categories pc, products p
where pc.product_id = p.id AND pc.category_id =
$category_id
order by p.title
limit 25
offset $offset
cheers
Matthew
--

Matthew Lunnon
Senior Software Engineer
RWA Ltd
www.rwa-net.co.uk

----- Original Message -----
From: CSN
To: pg***********@postgresql.org
Sent: Tuesday, February 10, 2004 7:51 PM
Subject: [GENERAL] Join query on 1M row table slow
I have a pretty simple select query that joins a
table
(p) with 125K rows with another table (pc) with
almost
one million rows:

select p.*
from product_categories pc
inner join products p
on pc.product_id = p.id
where pc.category_id = $category_id
order by p.title
limit 25
offset $offset

The query usually takes about five seconds to
execute
(all other PG queries perform fast enough). I have
indexes on everything needed, and EXPLAIN shows
they're being used. Is there anything else I can
do to
improve performance - such as tweaking some
settings
in the config?

Redhat 9, PG 7.4.1.

__________________________________
Do you Yahoo!?
Yahoo! Finance: Get your refund fast by filing
online.
http://taxes.yahoo.com/filing.html

---------------------------(end of
broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org
__________________________________________________ ___________________ This e-mail has been scanned for viruses by MCI's
Internet Managed Scanning Services - powered by
MessageLabs. For further information visit
http://www.mci.com

__________________________________
Do you Yahoo!?
Yahoo! Finance: Get your refund fast by filing online.
http://taxes.yahoo.com/filing.html

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Nov 22 '05 #22

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: Tim Fountain | last post by:
We've recently enabled slow query logging on a server and it's proving interesting seeing which queries are bogging things down. This one is puzzling me a little: SELECT articleid, type,...
0
by: Marek Lewczuk | last post by:
Hello, I have a strange problem, maybe some of you will be able to explain me something. I use LEFT JOIN as a substitute for subselects. It's true that many subselects can be rewriten using LEFT...
0
by: B. Fongo | last post by:
I learned MySQL last year without putting it into action; that is why I face trouble in formulating my queries. Were it a test, then you would have passed it, because your queries did help me...
3
by: Ike | last post by:
Oh I have a nasty query which runs incredibly slowly. I am running MySQL 4.0.20-standard. Thus, in trying to expedite the query, I am trying to set indexes in my tables. My query requires four...
3
by: Marcelo Noga | last post by:
Hi Everybody, I have a complex view, that includes a "group by" clause. I'm trying to join this view with a table, in a very simple query. The problem is that the optimizer is not using the...
2
by: Adam Nemitoff | last post by:
I have two tables populated during the use of an application to log user events and application states. They are named "EventTable" and "StateTable" and the structures follow: EventTable: ID...
7
by: deko | last post by:
SELECT tblTxAcct.TxAcctName, tblTxType.TxTypeName, Nz(qryTxAcctListCt.TxCount, 0) AS TxCt FROM (tblTxAcct INNER JOIN tblTxType ON tblTxAcct.TxType_ID=tblTxType.TxType_ID) LEFT JOIN qryTxAcctListCt...
6
by: dmonroe | last post by:
hi group -- Im having a nested inner join problem with an Access SQl statement/Query design. Im running the query from ASP and not usng the access interface at all. Here's the tables: ...
3
by: tdes42 | last post by:
I believe I am looking for some form of Join Query, but my understanding of Access logic and my logic do not yet click entirely…. I have a table of ocean buoy data, taken every hour over many...
0
by: DolphinDB | last post by:
The formulas of 101 quantitative trading alphas used by WorldQuant were presented in the paper 101 Formulaic Alphas. However, some formulas are complex, leading to challenges in calculation. Take...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
0
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
0
by: Shællîpôpï 09 | last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
0
by: af34tf | last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.