Why does adding SUM and GROUP BY destroy performance?

David Link

Hi,

Why does adding SUM and GROUP BY destroy performance?
details follow.
Thanks, David Link

s1.sql:
SELECT
t.tid, t.title,
COALESCE(s0c100r100.units, 0) as w0c100r100units,
(COALESCE(r1c2r100.units, 0) + COALESCE(y0c2r100.units, 0))
as r0c2r100units
FROM
title t
JOIN upc u1 ON t.tid = u1.tid
LEFT OUTER JOIN sale_200331 s0c100r100 ON u1.upc = s0c100r100.upc
AND s0c100r100.week = 200331 AND s0c100r100.channel = 100
AND s0c100r100.region = 100
LEFT OUTER JOIN rtd r1c2r100 ON u1.upc = r1c2r100.upc
AND r1c2r100.year = 2002 AND r1c2r100.channel = 2
AND r1c2r100.region = 100
LEFT OUTER JOIN ytd_200331 y0c2r100 ON u1.upc = y0c2r100.upc
AND y0c2r100.week = 200331 AND y0c2r100.channel = 2
AND y0c2r100.region = 100
LEFT OUTER JOIN media m ON t.media = m.key
LEFT OUTER JOIN screen_format sf ON t.screen_format = sf.key
WHERE
t.distributor != 'CONTROL LABEL'
ORDER BY
t.title ASC
LIMIT 50
;
s2.sql:
SELECT
t.tid, t.title,
SUM(COALESCE(s0c100r100.units, 0)) as w0c100r100units,
SUM((COALESCE(r1c2r100.units, 0) + COALESCE(y0c2r100.units, 0)))
as r0c2r100units
FROM
title t
JOIN upc u1 ON t.tid = u1.tid
LEFT OUTER JOIN sale_200331 s0c100r100 ON u1.upc = s0c100r100.upc
AND s0c100r100.week = 200331 AND s0c100r100.channel = 100
AND s0c100r100.region = 100
LEFT OUTER JOIN rtd r1c2r100 ON u1.upc = r1c2r100.upc
AND r1c2r100.year = 2002 AND r1c2r100.channel = 2
AND r1c2r100.region = 100
LEFT OUTER JOIN ytd_200331 y0c2r100 ON u1.upc = y0c2r100.upc
AND y0c2r100.week = 200331 AND y0c2r100.channel = 2
AND y0c2r100.region = 100
LEFT OUTER JOIN media m ON t.media = m.key
LEFT OUTER JOIN screen_format sf ON t.screen_format = sf.key
WHERE
t.distributor != 'CONTROL LABEL'
GROUP BY
t.tid, t.title
ORDER BY
t.title ASC
LIMIT 50
;
Times:
s1.sql takes 0m0.124s
s2.sql takes 1m1.450s

Stats:
title table: 68,000 rows
sale_200331 table: 150,000 rows
ytd_200331 table: 0 rows
rtd table: 650,000 rows

Indexes are in place.

s1 explain plan:
QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..65105.51 rows=50 width=132)
-> Nested Loop (cost=0.00..91726868.54 rows=70445 width=132)
Join Filter: ("outer".screen_format = "inner"."key")
-> Nested Loop (cost=0.00..91651668.74 rows=70445 width=127)
Join Filter: ("outer".media = "inner"."key")
-> Nested Loop (cost=0.00..91578053.95 rows=70445
width=122)
-> Nested Loop (cost=0.00..91236359.89
rows=70445 width=98)
-> Nested Loop (cost=0.00..90894665.82
rows=70445 width=74)
-> Nested Loop
(cost=0.00..90539626.76 rows=70445 width=50)
-> Index Scan using
title_title_ind on title t (cost=0.00..193051.67 rows=68775 width=38)
Filter: (distributor <>
'CONTROL LABEL'::character varying)
-> Index Scan using
davids_tid_index on upc u1 (cost=0.00..1309.24 rows=353 width=12)
Index Cond: ("outer".tid =
u1.tid)
-> Index Scan using
sale_200331_upc_wk_chl_reg_ind on sale_200331 s0c100r100
(cost=0.00..5.02 rows=1 width=24)
Index Cond: (("outer".upc =
s0c100r100.upc) AND (s0c100r100.week = 200331) AND (s0c100r100.channel
= 100) AND (s0c100r100.region = 100))
-> Index Scan using
rtd_upc_year_chl_reg_ind on rtd r1c2r100 (cost=0.00..4.83 rows=1
width=24)
Index Cond: (("outer".upc =
r1c2r100.upc) AND (r1c2r100."year" = 2002) AND (r1c2r100.channel = 2)
AND (r1c2r100.region = 100))
-> Index Scan using ytd_200331_upc_wkchlreg_ind
on ytd_200331 y0c2r100 (cost=0.00..4.83 rows=1 width=24)
Index Cond: (("outer".upc = y0c2r100.upc)
AND (y0c2r100.week = 200331) AND (y0c2r100.channel = 2) AND
(y0c2r100.region = 100))
-> Seq Scan on media m (cost=0.00..1.02 rows=2
width=5)
-> Seq Scan on screen_format sf (cost=0.00..1.03 rows=3
width=5)
(21 rows)
s2 explain plan:

QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=403996.99..403997.11 rows=50 width=132)
-> Sort (cost=403996.99..404014.60 rows=7044 width=132)
Sort Key: t.title
-> Aggregate (cost=402393.74..403274.30 rows=7044 width=132)
-> Group (cost=402393.74..402922.08 rows=70445
width=132)
-> Sort (cost=402393.74..402569.86 rows=70445
width=132)
Sort Key: t.tid, t.title
-> Hash Join (cost=375382.76..392011.46
rows=70445 width=132)
Hash Cond: ("outer".screen_format =
"inner"."key")
-> Hash Join
(cost=375381.72..390997.78 rows=70445 width=127)
Hash Cond: ("outer".media =
"inner"."key")
-> Merge Join
(cost=375380.70..390057.49 rows=70445 width=122)
Merge Cond: ("outer".upc =
"inner".upc)
Join Filter:
(("inner".week = 200331) AND ("inner".channel = 2) AND ("inner".region
= 100))
-> Merge Join
(cost=375380.70..382782.40 rows=70445 width=98)
Merge Cond:
("outer".upc = "inner".upc)
Join Filter:
(("inner"."year" = 2002) AND ("inner".channel = 2) AND ("inner".region
= 100))
-> Sort
(cost=375310.87..375486.98 rows=70445 width=74)
Sort Key:
u1.upc
-> Nested
Loop (cost=6348.20..367282.53 rows=70445 width=74)
-> Hash
Join (cost=6348.20..12243.46 rows=70445 width=50)

Hash Cond: ("outer".tid = "inner".tid)
->
Seq Scan on upc u1 (cost=0.00..2795.28 rows=70628 width=12)
->
Hash (cost=4114.93..4114.93 rows=68775 width=38)

-> Seq Scan on title t (cost=0.00..4114.93 rows=68775 width=38)

Filter: (distributor <> 'CONTROL LABEL'::character varying)
->
Index Scan using sale_200331_upc_wk_chl_reg_ind on sale_200331
s0c100r100 (cost=0.00..5.02 rows=1 width=24)

Index Cond: (("outer".upc = s0c100r100.upc) AND (s0c100r100.week =
200331) AND (s0c100r100.channel = 100) AND (s0c100r100.region = 100))
-> Sort
(cost=69.83..72.33 rows=1000 width=24)
Sort Key:
r1c2r100.upc
-> Seq Scan
on rtd r1c2r100 (cost=0.00..20.00 rows=1000 width=24)
-> Index Scan using
ytd_200331_upc_wkchlreg_ind on ytd_200331 y0c2r100 (cost=0.00..52.00
rows=1000 width=24)
-> Hash (cost=1.02..1.02
rows=2 width=5)
-> Seq Scan on media m
(cost=0.00..1.02 rows=2 width=5)
-> Hash (cost=1.03..1.03 rows=3
width=5)
-> Seq Scan on screen_format sf
(cost=0.00..1.03 rows=3 width=5)
(36 rows)

__________________________________
Do you Yahoo!?
Yahoo! SiteBuilder - Free, easy-to-use web site design software
http://sitebuilder.yahoo.com

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Nov 11 '05 #1

Subscribe Post Reply

3828

Ron Johnson

On Wed, 2003-09-17 at 12:51, David Link wrote:

Hi,

Why does adding SUM and GROUP BY destroy performance?
details follow.
Thanks, David Link

PostgreSQL's generalized UDFs makes optimizing for the standard
aggregates very hard to do. There should be some improvement in
v7.4, though.

--
-----------------------------------------------------------------
Ron Johnson, Jr. ro***********@cox.net
Jefferson, LA USA

"Those who would give up essential Liberty to purchase a little
temporary safety, deserve neither Liberty nor safety." or
something like that
Ben Franklin, maybe
---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to ma*******@postgresql.org)

Nov 11 '05 #2

Christopher Browne

In the last exciting episode, dv****@yahoo.com (David Link) wrote:

Why does adding SUM and GROUP BY destroy performance?

When you use SUM (or other aggregates), there are no short cuts to
walking through each and every tuple specified by the WHERE clause.

On some systems there are statistics mechanisms that can short-circuit
that. On PostgreSQL, the use of MVCC to let new data "almost
magically appear" :-) has the demerit, in the case of aggregates, of
not leaving much opening for short cuts.

There are some cases where you CAN do much better than the aggregates
do.

SELECT MAX(FIELD) FROM TABLE WHERE A='THIS' and B='THAT';

may be replaced with the likely-to-be-faster:

select field from table where a = 'THIS' and b='THAT' order by field
desc limit 1;

MIN() admits a similar rewriting. If there is an index on FIELD, this
will likely be _way_ faster than using MIN()/MAX().

In a sense, it's not that aggregates "destroy" performance; just that
there are no magical shortcuts to make them incredibly fast.
--
wm(X,Y):-write(X),write('@'),write(Y). wm('cbbrowne','acm.org').
http://www.ntlug.org/~cbbrowne/multiplexor.html
"And 1.1.81 is officially BugFree(tm), so if you receive any bug
reports on it, you know they are just evil lies." -- Linus Torvalds

Nov 11 '05 #3

Ang Chin Han

Christopher Browne wrote:

In the last exciting episode, dv****@yahoo.com (David Link) wrote:
Why does adding SUM and GROUP BY destroy performance?

When you use SUM (or other aggregates), there are no short cuts to
walking through each and every tuple specified by the WHERE clause.

Er... not in this case, if I read David's email correctly.

His first query is walking through every tuple anyway.
His second query is the one summing them up, AFTER, here's the critical
part, GROUPing them by t.tid.

I suspect 7.4 (now in beta), or rewriting the query for <7.4 would speed
thing up. 7.4's Hash Aggregate would be the winner here.

As for rewriting this, David, try:

SELECT t.tid, t.title,
(select the stuff you want from lots of tables where something = t.tid)
FROM
title t;

Doubt it'll be as fast as using 7.4 though.

--
Linux homer 2.4.18-14 #1 Wed Sep 4 13:35:50 EDT 2002 i686 i686 i386
GNU/Linux
3:00pm up 267 days, 6:26, 4 users, load average: 5.44, 5.26, 5.17

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQE/aq6DNYbTUIgzwfARAqOwAJkBlUboVeUzzfzb46LtyGqNGkeyiQ CfROqY
mTkCLVPwCABnniHh7FSAqS8=
=Cy6c
-----END PGP SIGNATURE-----

Nov 11 '05 #4

Tom Lane

Ang Chin Han <an***@bytecraft.com.my> writes:

Christopher Browne wrote:
When you use SUM (or other aggregates), there are no short cuts to
walking through each and every tuple specified by the WHERE clause.
Er... not in this case, if I read David's email correctly. His first query is walking through every tuple anyway.

No, it isn't, because he had a LIMIT. I think the real point is that
computing the first fifty groups requires sucking in a lot more tuples
than just computing the first fifty rows.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Nov 11 '05 #5

by: Christian Seberino | last post by:

How does Ruby compare to Python?? How good is DESIGN of Ruby compared to Python? Python's design is godly. I'm wondering if Ruby's is godly too. I've heard it has solid OOP design but then...

Python

How does adding constraints affect performance?

by: Robin Tucker | last post by:

I'm considering adding domain integrity checks to some of my database table items. How does adding such constraints affect SQL Server performance? For example, I have a simple constraint that...

Microsoft SQL Server

What does it mean?

by: Jason Huang | last post by:

Hi, Would someone explain the following coding more detail for me? What's the ( ) for? CurrentText = (TextBox)e.Item.Cells.Controls; Thanks. Jason

C# / C Sharp

Does Static class affect performance of ASP.NET site

by: Raja Chandrasekaran | last post by:

Hai folks, I have a question to get exact answer from you people. My question is How Static class is differ from instance class and If you use static class in ASP.NET, ll it affect speed or...

C# / C Sharp

Overloaded delete does not calls the destructor??

by: rohits123 | last post by:

I have an overload delete operator as below ////////////////////////////////// void operator delete(void* mem,int head_type) { mmHead local_Head = CPRMemory::GetMemoryHead(head_type);...

C / C++

Viewstate error when adding user controls

by: Chris | last post by:

I am getting a viewstate error when adding dynamically user controls. SYSTEM_EXCEPTION:Failed to load viewstate. The control tree into which viewstate is being loaded must match the control tree...

ASP.NET

What does Microsoft write in .Net?

by: jim | last post by:

Maybe I'm missing something, but it doesn't look like Microsoft writes a lot of apps in .Net (although they certainly push it for others). What does MS write using pure .Net? If applications...

C# / C Sharp

How Does JavaScript Call Forth CSS??

by: Prisoner at War | last post by:

Okay, Folks, I guess my real burning concern all along is a "high-level" one: just how does JavaScript interact with CSS? Right now, my newbie self only knows JavaScript and CSS to *co-...

Javascript

process.destroy() does not kill the subprocess

by: itsraghz | last post by:

Dear All, I have an issue with destroy() method of java.lang.Process class. All what I am trying to do is, controlling the execution of one program through another. Let's say, Program B has to be...

Java

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General

Why does adding SUM and GROUP BY destroy performance?

Similar topics