473,583 Members | 3,386 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Why does adding SUM and GROUP BY destroy performance?

Hi,

Why does adding SUM and GROUP BY destroy performance?
details follow.
Thanks, David Link

s1.sql:
SELECT
t.tid, t.title,
COALESCE(s0c100 r100.units, 0) as w0c100r100units ,
(COALESCE(r1c2r 100.units, 0) + COALESCE(y0c2r1 00.units, 0))
as r0c2r100units
FROM
title t
JOIN upc u1 ON t.tid = u1.tid
LEFT OUTER JOIN sale_200331 s0c100r100 ON u1.upc = s0c100r100.upc
AND s0c100r100.week = 200331 AND s0c100r100.chan nel = 100
AND s0c100r100.regi on = 100
LEFT OUTER JOIN rtd r1c2r100 ON u1.upc = r1c2r100.upc
AND r1c2r100.year = 2002 AND r1c2r100.channe l = 2
AND r1c2r100.region = 100
LEFT OUTER JOIN ytd_200331 y0c2r100 ON u1.upc = y0c2r100.upc
AND y0c2r100.week = 200331 AND y0c2r100.channe l = 2
AND y0c2r100.region = 100
LEFT OUTER JOIN media m ON t.media = m.key
LEFT OUTER JOIN screen_format sf ON t.screen_format = sf.key
WHERE
t.distributor != 'CONTROL LABEL'
ORDER BY
t.title ASC
LIMIT 50
;
s2.sql:
SELECT
t.tid, t.title,
SUM(COALESCE(s0 c100r100.units, 0)) as w0c100r100units ,
SUM((COALESCE(r 1c2r100.units, 0) + COALESCE(y0c2r1 00.units, 0)))
as r0c2r100units
FROM
title t
JOIN upc u1 ON t.tid = u1.tid
LEFT OUTER JOIN sale_200331 s0c100r100 ON u1.upc = s0c100r100.upc
AND s0c100r100.week = 200331 AND s0c100r100.chan nel = 100
AND s0c100r100.regi on = 100
LEFT OUTER JOIN rtd r1c2r100 ON u1.upc = r1c2r100.upc
AND r1c2r100.year = 2002 AND r1c2r100.channe l = 2
AND r1c2r100.region = 100
LEFT OUTER JOIN ytd_200331 y0c2r100 ON u1.upc = y0c2r100.upc
AND y0c2r100.week = 200331 AND y0c2r100.channe l = 2
AND y0c2r100.region = 100
LEFT OUTER JOIN media m ON t.media = m.key
LEFT OUTER JOIN screen_format sf ON t.screen_format = sf.key
WHERE
t.distributor != 'CONTROL LABEL'
GROUP BY
t.tid, t.title
ORDER BY
t.title ASC
LIMIT 50
;
Times:
s1.sql takes 0m0.124s
s2.sql takes 1m1.450s

Stats:
title table: 68,000 rows
sale_200331 table: 150,000 rows
ytd_200331 table: 0 rows
rtd table: 650,000 rows

Indexes are in place.

s1 explain plan:
QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..651 05.51 rows=50 width=132)
-> Nested Loop (cost=0.00..917 26868.54 rows=70445 width=132)
Join Filter: ("outer".screen _format = "inner"."ke y")
-> Nested Loop (cost=0.00..916 51668.74 rows=70445 width=127)
Join Filter: ("outer".med ia = "inner"."ke y")
-> Nested Loop (cost=0.00..915 78053.95 rows=70445
width=122)
-> Nested Loop (cost=0.00..912 36359.89
rows=70445 width=98)
-> Nested Loop (cost=0.00..908 94665.82
rows=70445 width=74)
-> Nested Loop
(cost=0.00..905 39626.76 rows=70445 width=50)
-> Index Scan using
title_title_ind on title t (cost=0.00..193 051.67 rows=68775 width=38)
Filter: (distributor <>
'CONTROL LABEL'::charact er varying)
-> Index Scan using
davids_tid_inde x on upc u1 (cost=0.00..130 9.24 rows=353 width=12)
Index Cond: ("outer".tid =
u1.tid)
-> Index Scan using
sale_200331_upc _wk_chl_reg_ind on sale_200331 s0c100r100
(cost=0.00..5.0 2 rows=1 width=24)
Index Cond: (("outer".upc =
s0c100r100.upc) AND (s0c100r100.wee k = 200331) AND (s0c100r100.cha nnel
= 100) AND (s0c100r100.reg ion = 100))
-> Index Scan using
rtd_upc_year_ch l_reg_ind on rtd r1c2r100 (cost=0.00..4.8 3 rows=1
width=24)
Index Cond: (("outer".upc =
r1c2r100.upc) AND (r1c2r100."year " = 2002) AND (r1c2r100.chann el = 2)
AND (r1c2r100.regio n = 100))
-> Index Scan using ytd_200331_upc_ wkchlreg_ind
on ytd_200331 y0c2r100 (cost=0.00..4.8 3 rows=1 width=24)
Index Cond: (("outer".upc = y0c2r100.upc)
AND (y0c2r100.week = 200331) AND (y0c2r100.chann el = 2) AND
(y0c2r100.regio n = 100))
-> Seq Scan on media m (cost=0.00..1.0 2 rows=2
width=5)
-> Seq Scan on screen_format sf (cost=0.00..1.0 3 rows=3
width=5)
(21 rows)
s2 explain plan:

QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=403996.99 ..403997.11 rows=50 width=132)
-> Sort (cost=403996.99 ..404014.60 rows=7044 width=132)
Sort Key: t.title
-> Aggregate (cost=402393.74 ..403274.30 rows=7044 width=132)
-> Group (cost=402393.74 ..402922.08 rows=70445
width=132)
-> Sort (cost=402393.74 ..402569.86 rows=70445
width=132)
Sort Key: t.tid, t.title
-> Hash Join (cost=375382.76 ..392011.46
rows=70445 width=132)
Hash Cond: ("outer".screen _format =
"inner"."ke y")
-> Hash Join
(cost=375381.72 ..390997.78 rows=70445 width=127)
Hash Cond: ("outer".med ia =
"inner"."ke y")
-> Merge Join
(cost=375380.70 ..390057.49 rows=70445 width=122)
Merge Cond: ("outer".upc =
"inner".upc )
Join Filter:
(("inner".wee k = 200331) AND ("inner".channe l = 2) AND ("inner".reg ion
= 100))
-> Merge Join
(cost=375380.70 ..382782.40 rows=70445 width=98)
Merge Cond:
("outer".upc = "inner".upc )
Join Filter:
(("inner"."year " = 2002) AND ("inner".channe l = 2) AND ("inner".reg ion
= 100))
-> Sort
(cost=375310.87 ..375486.98 rows=70445 width=74)
Sort Key:
u1.upc
-> Nested
Loop (cost=6348.20.. 367282.53 rows=70445 width=74)
-> Hash
Join (cost=6348.20.. 12243.46 rows=70445 width=50)

Hash Cond: ("outer".tid = "inner".tid )
->
Seq Scan on upc u1 (cost=0.00..279 5.28 rows=70628 width=12)
->
Hash (cost=4114.93.. 4114.93 rows=68775 width=38)

-> Seq Scan on title t (cost=0.00..411 4.93 rows=68775 width=38)

Filter: (distributor <> 'CONTROL LABEL'::charact er varying)
->
Index Scan using sale_200331_upc _wk_chl_reg_ind on sale_200331
s0c100r100 (cost=0.00..5.0 2 rows=1 width=24)

Index Cond: (("outer".upc = s0c100r100.upc) AND (s0c100r100.wee k =
200331) AND (s0c100r100.cha nnel = 100) AND (s0c100r100.reg ion = 100))
-> Sort
(cost=69.83..72 .33 rows=1000 width=24)
Sort Key:
r1c2r100.upc
-> Seq Scan
on rtd r1c2r100 (cost=0.00..20. 00 rows=1000 width=24)
-> Index Scan using
ytd_200331_upc_ wkchlreg_ind on ytd_200331 y0c2r100 (cost=0.00..52. 00
rows=1000 width=24)
-> Hash (cost=1.02..1.0 2
rows=2 width=5)
-> Seq Scan on media m
(cost=0.00..1.0 2 rows=2 width=5)
-> Hash (cost=1.03..1.0 3 rows=3
width=5)
-> Seq Scan on screen_format sf
(cost=0.00..1.0 3 rows=3 width=5)
(36 rows)


_______________ _______________ ____
Do you Yahoo!?
Yahoo! SiteBuilder - Free, easy-to-use web site design software
http://sitebuilder.yahoo.com

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Nov 11 '05 #1
4 3844
On Wed, 2003-09-17 at 12:51, David Link wrote:
Hi,

Why does adding SUM and GROUP BY destroy performance?
details follow.
Thanks, David Link


PostgreSQL's generalized UDFs makes optimizing for the standard
aggregates very hard to do. There should be some improvement in
v7.4, though.

--
-----------------------------------------------------------------
Ron Johnson, Jr. ro***********@c ox.net
Jefferson, LA USA

"Those who would give up essential Liberty to purchase a little
temporary safety, deserve neither Liberty nor safety." or
something like that
Ben Franklin, maybe
---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddres sHere" to ma*******@postg resql.org)

Nov 11 '05 #2
In the last exciting episode, dv****@yahoo.co m (David Link) wrote:
Why does adding SUM and GROUP BY destroy performance?


When you use SUM (or other aggregates), there are no short cuts to
walking through each and every tuple specified by the WHERE clause.

On some systems there are statistics mechanisms that can short-circuit
that. On PostgreSQL, the use of MVCC to let new data "almost
magically appear" :-) has the demerit, in the case of aggregates, of
not leaving much opening for short cuts.

There are some cases where you CAN do much better than the aggregates
do.

SELECT MAX(FIELD) FROM TABLE WHERE A='THIS' and B='THAT';

may be replaced with the likely-to-be-faster:

select field from table where a = 'THIS' and b='THAT' order by field
desc limit 1;

MIN() admits a similar rewriting. If there is an index on FIELD, this
will likely be _way_ faster than using MIN()/MAX().

In a sense, it's not that aggregates "destroy" performance; just that
there are no magical shortcuts to make them incredibly fast.
--
wm(X,Y):-write(X),write( '@'),write(Y). wm('cbbrowne',' acm.org').
http://www.ntlug.org/~cbbrowne/multiplexor.html
"And 1.1.81 is officially BugFree(tm), so if you receive any bug
reports on it, you know they are just evil lies." -- Linus Torvalds
Nov 11 '05 #3
Christopher Browne wrote:
In the last exciting episode, dv****@yahoo.co m (David Link) wrote:
Why does adding SUM and GROUP BY destroy performance?

When you use SUM (or other aggregates), there are no short cuts to
walking through each and every tuple specified by the WHERE clause.


Er... not in this case, if I read David's email correctly.

His first query is walking through every tuple anyway.
His second query is the one summing them up, AFTER, here's the critical
part, GROUPing them by t.tid.

I suspect 7.4 (now in beta), or rewriting the query for <7.4 would speed
thing up. 7.4's Hash Aggregate would be the winner here.

As for rewriting this, David, try:

SELECT t.tid, t.title,
(select the stuff you want from lots of tables where something = t.tid)
FROM
title t;

Doubt it'll be as fast as using 7.4 though.

--
Linux homer 2.4.18-14 #1 Wed Sep 4 13:35:50 EDT 2002 i686 i686 i386
GNU/Linux
3:00pm up 267 days, 6:26, 4 users, load average: 5.44, 5.26, 5.17

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQE/aq6DNYbTUIgzwfA RAqOwAJkBlUboVe Uzzfzb46LtyGqNG keyiQCfROqY
mTkCLVPwCABnniH h7FSAqS8=
=Cy6c
-----END PGP SIGNATURE-----

Nov 11 '05 #4
Ang Chin Han <an***@bytecraf t.com.my> writes:
Christopher Browne wrote:
When you use SUM (or other aggregates), there are no short cuts to
walking through each and every tuple specified by the WHERE clause.
Er... not in this case, if I read David's email correctly. His first query is walking through every tuple anyway.


No, it isn't, because he had a LIMIT. I think the real point is that
computing the first fifty groups requires sucking in a lot more tuples
than just computing the first fifty rows.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Nov 11 '05 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

30
3423
by: Christian Seberino | last post by:
How does Ruby compare to Python?? How good is DESIGN of Ruby compared to Python? Python's design is godly. I'm wondering if Ruby's is godly too. I've heard it has solid OOP design but then I've also heard there are lots of weird ways to do some things kinda like Perl which is bad for me. Any other ideas?
1
3877
by: Robin Tucker | last post by:
I'm considering adding domain integrity checks to some of my database table items. How does adding such constraints affect SQL Server performance? For example, I have a simple constraint that restricts a couple of columns to having values within the values assigned in my application by an enumeration: (( >= 0 and <=3) and ( >= 0 and <=...
13
5029
by: Jason Huang | last post by:
Hi, Would someone explain the following coding more detail for me? What's the ( ) for? CurrentText = (TextBox)e.Item.Cells.Controls; Thanks. Jason
11
11148
by: Raja Chandrasekaran | last post by:
Hai folks, I have a question to get exact answer from you people. My question is How Static class is differ from instance class and If you use static class in ASP.NET, ll it affect speed or performance of site...? Because I am using static class in my Database layer. But one of my friend told me that "static class ll take more space...
9
8159
by: rohits123 | last post by:
I have an overload delete operator as below ////////////////////////////////// void operator delete(void* mem,int head_type) { mmHead local_Head = CPRMemory::GetMemoryHead(head_type); mmFree(&local_Head,(char *)mem); CPRMemory::SetMemoryHeadAs(local_Head,head_type); } ///////////////////// void* operator new(size_t sz, int head_Type) {
2
1944
by: Chris | last post by:
I am getting a viewstate error when adding dynamically user controls. SYSTEM_EXCEPTION:Failed to load viewstate. The control tree into which viewstate is being loaded must match the control tree that was used to save viewstate during the previous request. My understanding is that the information about the controls getting posted back has...
15
2051
by: jim | last post by:
Maybe I'm missing something, but it doesn't look like Microsoft writes a lot of apps in .Net (although they certainly push it for others). What does MS write using pure .Net? If applications like Symantec's antivirus, NeatReciepts or Franklin Covey's PlanPlus for Windows is any guide, .Net applications are slow and clunky. But, maybe the...
19
1725
by: Prisoner at War | last post by:
Okay, Folks, I guess my real burning concern all along is a "high-level" one: just how does JavaScript interact with CSS? Right now, my newbie self only knows JavaScript and CSS to *co- exist*...but I'm beginning to get the sense that they actually interact -- or, perhaps more precisely, JavaScript acts upon CSS...but how, exactly??
6
16900
by: itsraghz | last post by:
Dear All, I have an issue with destroy() method of java.lang.Process class. All what I am trying to do is, controlling the execution of one program through another. Let's say, Program B has to be executed conditionally through Program A based on the commands it gets from the user. Let's say, we have two inputs, "start" and "stop" to drive the...
0
7821
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language...
0
8172
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. ...
0
8320
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that...
1
7929
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For...
0
5370
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert...
0
3814
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in...
0
3841
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
2328
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
1
1424
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.