By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
431,827 Members | 2,155 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 431,827 IT Pros & Developers. It's quick & easy.

FILTERING LEFT TABLE IN LEFT OUTER JOIN

P: n/a
I have two tables that are pretty big. I need about 10 rows in the
left table and the right table is filtered to 5 rows as well. It
seems when I join the tables in the FROM clause, I have to filter my
left table in the WHERE clause and cannot filter it in the FROM
clause.

This seems like it would cause a lot of overhead especially when my
left table is ten thousand rows. Am I wrong in thinking that the two
tables get joined with the FROM clause, causing a huge result set,
then get filtered with the WHERE clause?

Does anyone know if maybe I am doing this wrong or thinking about it
wrong?
For example, I want a query to return the number of dollars made in
the state of NY even if a store does not exist in the right table:

----------
table1
----------
store #
state

----------
table2
----------
store #
type
earnings

-WORKS CORRECTLY BUT SEEMS VERY EXPENSIVE IF LEFT AND RIGHT TABLES ARE
BIG
select store, coalesce(earnings, '$0') from table1 LEFT OUTER JOIN
table2
on (t1.store = t2.store and t2.type = 'RESTAURANT')
and t1.state = 'NY'

----------
result
----------
231, $0
232, $10000
233, $50000
234, $0
I cant just join the tables, otherwise, I would get this result set:

----------
result
----------
232, $10000
233, $50000

-I HAVE TRIED THE FOLLOWING, BUT IT DOESN'T WORK
select store, coalesce(earnings, '$0') from table1 LEFT OUTER JOIN
table2
on (t1.store = t2.store and t2.type = 'RESTAURANT' and t1.state =
'NY')

Any advice??????

May 11 '07 #1
Share this Question
Share on Google+
9 Replies


P: n/a
sh********@gmail.com wrote:
I have two tables that are pretty big. I need about 10 rows in the
left table and the right table is filtered to 5 rows as well. It
seems when I join the tables in the FROM clause, I have to filter my
left table in the WHERE clause and cannot filter it in the FROM
clause.

This seems like it would cause a lot of overhead especially when my
left table is ten thousand rows. Am I wrong in thinking that the two
tables get joined with the FROM clause, causing a huge result set,
then get filtered with the WHERE clause?

Does anyone know if maybe I am doing this wrong or thinking about it
wrong?
For example, I want a query to return the number of dollars made in
the state of NY even if a store does not exist in the right table:

----------
table1
----------
store #
state

----------
table2
----------
store #
type
earnings

-WORKS CORRECTLY BUT SEEMS VERY EXPENSIVE IF LEFT AND RIGHT TABLES ARE
BIG
select store, coalesce(earnings, '$0') from table1 LEFT OUTER JOIN
table2
on (t1.store = t2.store and t2.type = 'RESTAURANT')
and t1.state = 'NY'

----------
result
----------
231, $0
232, $10000
233, $50000
234, $0
I cant just join the tables, otherwise, I would get this result set:

----------
result
----------
232, $10000
233, $50000

-I HAVE TRIED THE FOLLOWING, BUT IT DOESN'T WORK
select store, coalesce(earnings, '$0') from table1 LEFT OUTER JOIN
table2
on (t1.store = t2.store and t2.type = 'RESTAURANT' and t1.state =
'NY')

Any advice??????
This approach should work, but the syntax may be NOT quite right:

select store, coalesce(earnings,0)
from (select * from t1 where state='NY') as ta
left outer
join (select * from t2 where type='RESTAURANT') as tb
on ta.store=tb.store
May 11 '07 #2

P: n/a
Ian
sh********@gmail.com wrote:
I have two tables that are pretty big. I need about 10 rows in the
left table and the right table is filtered to 5 rows as well. It
seems when I join the tables in the FROM clause, I have to filter my
left table in the WHERE clause and cannot filter it in the FROM
clause.
This won't make any difference. DB2 rewrites queries during
optimization -- so putting the predicates in the WHERE clause
as opposed join predicates won't matter.

Does anyone know if maybe I am doing this wrong or thinking about it
wrong?
You're thinking about it wrong.

Are you actually having a performance problem? Or you're just thinking
that there could be a problem?
Any advice??????
Have you done RUNSTATS on your tables? Do you have any indexes on your
tables? What does the access path look like?
May 11 '07 #3

P: n/a
I actually don't have a problem yet on our test boxes since our test
server doesn't contain nearly the amount of data in production. But
we will be sending LIVE soon and I was reviewing this code and it just
brought up a concern. Our LIVE system has much more data, so I just
wanted to make sure I understood things correctly. Thanks for the
advice.

However, I have to disagree with your conclusion as a whole because
using the FROM clause when limiting the right table makes a big
difference than when using the WHERE clause to filter the right
table. I have seen this first hand through testing with this query.

Seems like Bob's response would be efficient. Any thoughts??

May 11 '07 #4

P: n/a
SVK
On May 11, 2:55 am, Ian <ianb...@mobileaudio.comwrote:
shaneva...@gmail.com wrote:
I have two tables that are pretty big. I need about 10 rows in the
left table and the right table is filtered to 5 rows as well. It
seems when I join the tables in the FROM clause, I have to filter my
left table in the WHERE clause and cannot filter it in the FROM
clause.

This won't make any difference. DB2 rewrites queries during
optimization -- so putting the predicates in the WHERE clause
as opposed join predicates won't matter.
Does anyone know if maybe I am doing this wrong or thinking about it
wrong?

You're thinking about it wrong.

Are you actually having a performance problem? Or you're just thinking
that there could be a problem?
Any advice??????

Have you done RUNSTATS on your tables? Do you have any indexes on your
tables? What does the access path look like?
I fully agree with Ian:
1) It needs to be clearly stated: what indexes are available for those
original tables?
2) EXPLAIN must be executed on those various types of possible
queries, to fully understand the access path?
3) The access path may seriously change depending on RUNSTATS
performed or not, and on different RUNSTATS parameters.

Preliminary I can suggest, to improve performance of this particular
query, i.e. the original one:

select store,
coalesce(earnings, '$0')
from table1
LEFT OUTER
JOIN table2
on ( t1.store = t2.store
and t2.type = 'RESTAURANT'
)
and t1.state = 'NY'

The following indexes would be really useful:

create ... index ix1_table1
on table1
(
state asc,
store # asc
);

create ... index ix1_table2
on table2
(
store # asc,
type asc
) ;

Please try it.
I'm pretty sure you will see big difference in performance.

May 11 '07 #5

P: n/a
Hi.

sh********@gmail.com wrote:
However, I have to disagree with your conclusion as a whole because
using the FROM clause when limiting the right table makes a big
difference than when using the WHERE clause to filter the right
table. I have seen this first hand through testing with this query.
When you put a WHERE clause on the right table, it acts to make the
report behave as if it was an inner join. If you are filtering the
right table in the FROM clause, then I think DB2 needs to scan through
the left table (using any filter clauses from either the table
definition or the WHERE clause) and then join to the right table to
see if there are valid rows to connect to; if you filter the right
table in the WHERE clause, then DB2 can re-write the query to first
pick up the right table using that filter clause, and join into the
left table. So, there could be a big performance difference. In any
case, I would expect there to be significant results differences.

-Chris

May 11 '07 #6

P: n/a
Ian
sh********@gmail.com wrote:
I actually don't have a problem yet on our test boxes since our test
server doesn't contain nearly the amount of data in production. But
we will be sending LIVE soon and I was reviewing this code and it just
brought up a concern. Our LIVE system has much more data, so I just
wanted to make sure I understood things correctly. Thanks for the
advice.

However, I have to disagree with your conclusion as a whole because
using the FROM clause when limiting the right table makes a big
difference than when using the WHERE clause to filter the right
table. I have seen this first hand through testing with this query.
OK. But answer the questions I posted earlier. If we're talking
about an untuned system where the optimizer has to make guesses, seeing
big differences is certainly a possibility.
Seems like Bob's response would be efficient. Any thoughts??
Again, this is completely dependent on how DB2 decides to rewrite the
query.

A long time ago, it used to be that you could take advantage of
"opportunities" within the optimizer to cause certain behavior.
However, the optimizer query-rewrite phase is generally a lot
smarter than any of us. So the chances of you being able to outsmart
it are slim.

And of course, you can verify this by simply looking at the explains.
Ian
May 11 '07 #7

P: n/a
thanks for all the feedback. this is very helpful. I am not sure on
the statistics nor am I sure indexes. I have to get the DBA's
involved since I do not have access to do this stuff on the boxes. I
will take what I have found out here and apply and research. Thanks
again!

May 12 '07 #8

P: n/a
On May 11, 9:06 am, shaneva...@gmail.com wrote:
-WORKS CORRECTLY BUT SEEMS VERY EXPENSIVE IF LEFT AND RIGHT TABLES ARE
BIG
select store, coalesce(earnings, '$0') from table1 LEFT OUTER JOIN
table2
on (t1.store = t2.store and t2.type = 'RESTAURANT')
and t1.state = 'NY'

----------
result
----------
231, $0
232, $10000
233, $50000
234, $0
I doubt this query works corrrectly. Because, ON condition of LEFT
OUTER JOIN specifies selection condition of rows of RIGHT TABLE, and
all rows of LEFT TABLE would be selected even if the ON condition
includes predicate for LEFT TABLE(In your example t1.state = 'NY').
I want to recommend you to add rows t1.state <'NY' then execute the
Query again. You will see to be selected those added rows with
t1.state <'NY', even you had specified " and t1.state = 'NY' ".
While rows of RIGHT TABLE must not be selected even satisfied
predicates for RIGHT TABLE(i.e. t1.store = t2.store and t2.type =
'RESTAURANT').
If you want select LEFT TABLE, you should specify selection condition
in WHERE caluse. If wrote in WHERE clause DB2 is enough clever(as Ian
wrote). So, you need not worry about performance.
Indexes recommended by SVK must be effective.
Again, I want to recommend you to try following Query with that
indexes and see access path by EXPLAIN.
select store, coalesce(earnings, '$0')
from table1 LEFT OUTER JOIN
table2
on t1.store = t2.store and t2.type = 'RESTAURANT'
WHERE t1.state = 'NY'
;
Another point is usuallly parentheses in conditions are not influenced
on access path(then performance), if all predicate are connected by
only AND or only OR(In other words there's no mixture of AND and OR).
Parentheses are only influenced on readability of the conditions, if
predicates are not connected with mixture of AND and OR.

May 13 '07 #9

P: n/a
On 10 May 2007 17:06:49 -0700, sh********@gmail.com wrote:
>I have two tables that are pretty big. I need about 10 rows in the
left table and the right table is filtered to 5 rows as well. It
seems when I join the tables in the FROM clause, I have to filter my
left table in the WHERE clause and cannot filter it in the FROM
clause.

This seems like it would cause a lot of overhead especially when my
left table is ten thousand rows. Am I wrong in thinking that the two
tables get joined with the FROM clause, causing a huge result set,
then get filtered with the WHERE clause?

Does anyone know if maybe I am doing this wrong or thinking about it
wrong?
For example, I want a query to return the number of dollars made in
the state of NY even if a store does not exist in the right table:

----------
table1
----------
store #
state

----------
table2
----------
store #
type
earnings

-WORKS CORRECTLY BUT SEEMS VERY EXPENSIVE IF LEFT AND RIGHT TABLES ARE
BIG
select store, coalesce(earnings, '$0') from table1 LEFT OUTER JOIN
table2
on (t1.store = t2.store and t2.type = 'RESTAURANT')
and t1.state = 'NY'

----------
result
----------
231, $0
232, $10000
233, $50000
234, $0
I cant just join the tables, otherwise, I would get this result set:

----------
result
----------
232, $10000
233, $50000

-I HAVE TRIED THE FOLLOWING, BUT IT DOESN'T WORK
select store, coalesce(earnings, '$0') from table1 LEFT OUTER JOIN
table2
on (t1.store = t2.store and t2.type = 'RESTAURANT' and t1.state =
'NY')

Any advice??????
Depending on the amount of information involved, a subquery might be
the better solution:

SELECT
Store,
(
SELECT
COASLESCE(Earnings, '$0')
FROM
Table2
WHERE
Table2.Store = Table1.Store
AND Table2.Type = 'RESTAURANT'
)
FROM
Table1
WHERE
Table1.State = 'NY'

B.
May 16 '07 #10

This discussion thread is closed

Replies have been disabled for this discussion.