Connecting Tech Pros Worldwide Forums | Help | Site Map

Optimal Method for Getting Max Value

esmith2112
Guest
 
Posts: n/a
#1: Oct 29 '08
I have a non-performing query and am in need of tip or two to melt the
brain freeze I'm currently experiencing to get around it.

I have a large parent table, with an even larger child table (approx 5
rows per parent item). I want to join parent data with only one row
(the most current) from child table. The overall predicate for the
join is highly selective, but is very verbose and involves lots of
tables. I tried materializing the child's data several ways, but
always end up with a big table scan.

In its simplest form

TABLE PARENT
parent_id int (PK),
someinfo varchar(100)


TABLE CHILD
parent_id int (PK),
ts timestamp (PK)
otherinfo varchar(100)



select p.parent_id, p.someinfo, c.ts, c.otherinfo
from parent p
join (
select c.parent_id, max(c.ts) as max_ts
from child c
group by c.parent_id
) as max_child on max_child.parent_id = p.parent_id
join child c on max_child.parent_id = c.parent_id and max_child.max_ts
= c.ts
join <A BUNCH OF OTHER TABLES)
WHERE <LARGE PREDICATE>

The optimizer chooses a table scan for the inline table expression.
I've tried altering this same expression to use the rownumber() OLAP
function instead of the max() with similar slow results. I thought I
had seen, in past behavior, the optimizer's ability to push the
predicate down into the table expression. The only way I have found to
make this happen is to manually push the joins and predicates into the
table expression wherein the performance is restored.

So technically my problem is solved, but it makes for highly
unreadable verbose SQL. Isn't there another way represent that
functionality without the verbosity?

Thanks,
Evan

AIX 5.3 + DB2 8.2 FP16

--CELKO--
Guest
 
Posts: n/a
#2: Nov 2 '08

re: Optimal Method for Getting Max Value


Quote:
Quote:
>I have a large Parents [referenced?] table, with an even larger child [referencing?] table (approx 5 rows per Parents [referenced?] item). <<
You are using terminology FROM network databases AND NOT RDBMS. A
table with a singular name will have only one row it in; if it were a
SET, THEN there would be a plural OR collective name. This is very
basic data modeling, so what you have makes it hard to understand
Quote:
Quote:
>I want to join Parents [sic] data with only one row (the most current) from child [sic] table. The overall predicate for the join is highly selective, but is very verbose and involves lots of tables. I tried materializing the child's [sic] data several ways, but always end up with a big table scan. <<
Let me try to clean up your personal pseudo-code and amek guesses
about DRI:

CREATE TABLE Parents
(parent_id INTEGER NOT NULL PRIMARY KEY,
some_info VARCHAR (100) NOT NULL);

CREATE TABLE Children
(parent_id INTEGER NOT NULL
REFERENCES Parents (parent_id)
ON UPDATE CASCADE
ON DELETE CASCADE,
event_time TIMESTAMP NOT NULL,
PRIMARY KEY (parent_id, event_time),
other_info VARCHAR(100) NOT NULL);

Did I guess right? Here is another guess, using the MAX() OVER()
option to get the desired child for each parent. Untested, of course.

SELECT X.parent_id, X.some_info, X.event_time, X.other_info
FROM (SELECT P.parent_id, P.some_info, C.event_time, C.other_info,
MAX(C.event_time)
OVER (PARTITION BY P.parent_id) AS event_time_max
FROM Parents AS P,
Children AS C,
<a bunch of other tables>
WHERE C.parent_id = P.parent_id
AND <some search conditions on other tables>)
AS X
WHERE X.event_time = X.event_time_max;
esmith2112
Guest
 
Posts: n/a
#3: Nov 5 '08

re: Optimal Method for Getting Max Value


Thanks for the tip. Sorry about the confusing scenario. You surmised
correctly on all counts. The actual tables and queries were just too
gargantuan to post and easily digest, so I tried to simplify with
hopes that the gist was transferred. When using the MAX () OVER ()
functionality, the performance has greatly improved and the query is
much easier on the eyes as well. Thanks for enriching my arsenal of
SQL techniques.

Best,
Evan



On Nov 2, 3:10 pm, --CELKO-- <jcelko...@earthlink.netwrote:
Quote:
>
Did I guess right? Here is another guess, using the MAX() OVER()
option to get the desired child for each parent. Untested, of course.
>
SELECT X.parent_id, X.some_info, X.event_time, X.other_info
FROM (SELECT P.parent_id, P.some_info, C.event_time, C.other_info,
MAX(C.event_time)
OVER (PARTITION BY P.parent_id) AS event_time_max
FROM Parents AS P,
Children AS C,
<a bunch of other tables>
WHERE C.parent_id = P.parent_id
AND <some search conditions on other tables>)
AS X
WHERE X.event_time = X.event_time_max;
--CELKO--
Guest
 
Posts: n/a
#4: Nov 5 '08

re: Optimal Method for Getting Max Value


> You surmised correctly on all counts. <<

That is a first :)!! I usually look for a Zebra instead of a horse.
Quote:
Quote:
>. When using the MAX () OVER () functionality, the performance has greatly improved and the query is much easier on the eyes as well. Thanks for enriching my arsenal of SQL techniques.<<
I am still trying to collect tricks with the new OVER() options for
another edition of SQL FOR SMARTIES in 4-5 years. If anyone comes up
with something, send it to me!! I have a feeling that there are some
good tricks that involve CASE expressions, DISTINCT aggregate
functions and some other stuff I have not found yet.
Closed Thread