By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
424,846 Members | 1,211 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 424,846 IT Pros & Developers. It's quick & easy.

SQL Query x Multiple table Scans

P: n/a
Hello,

Db2 Linux LUW FP 15.

Consider table A (ID varchar, EXECUTION_DATE date).

a) I want to first retrieve all IDs that have not executed during the
last 90 days:

select distinct ID from table_B where NOT EXISTS
(select 1
from table_A
where EXECUTION_DATE between current date - 90 days and current
date AND
table_A.ID = table_B.ID)

b) Now I need the last execution for all IDs that have NOT executed
during the last 90 days:

select TABLE_A.ID, MAX(EXECUTION_DATE) from TABLE_A,

(
select distinct ID from table_B where NOT EXISTS
(select 1
from table_A
where EXECUTION_DATE between current date - 90 days and current
date AND
table_A.ID = table_B.ID)
) as TABLE_C

where TABLE_A.ID = TABLE_C.ID and
EXECUTION_DATE < current date - 90 days

group by TABLE_A.ID

That was easy. The question is ... is there any way to do the same
thing using a better approach, perhaps SINGLE table scan on table_A ?
This table is really big.

TIA, Michel.
Nov 30 '07 #1
Share this Question
Share on Google+
12 Replies


P: n/a
On Nov 30, 6:33 pm, Michel Esber <mic...@us.automatos.comwrote:
Hello,

Db2 Linux LUW FP 15.

Consider table A (ID varchar, EXECUTION_DATE date).

a) I want to first retrieve all IDs that have not executed during the
last 90 days:

select distinct ID from table_B where NOT EXISTS
(select 1
from table_A
where EXECUTION_DATE between current date - 90 days and current
date AND
table_A.ID = table_B.ID)

b) Now I need the last execution for all IDs that have NOT executed
during the last 90 days:

select TABLE_A.ID, MAX(EXECUTION_DATE) from TABLE_A,

(
select distinct ID from table_B where NOT EXISTS
(select 1
from table_A
where EXECUTION_DATE between current date - 90 days and current
date AND
table_A.ID = table_B.ID)
) as TABLE_C

where TABLE_A.ID = TABLE_C.ID and
EXECUTION_DATE < current date - 90 days

group by TABLE_A.ID

That was easy. The question is ... is there any way to do the same
thing using a better approach, perhaps SINGLE table scan on table_A ?
This table is really big.

TIA, Michel.
What indexes do you have on table_A and table_B, and what access plan
do you currently have?

/Lennart

Nov 30 '07 #2

P: n/a
On Nov 30, 10:59 pm, Lennart <Erik.Lennart.Jons...@gmail.comwrote:
On Nov 30, 6:33 pm, Michel Esber <mic...@us.automatos.comwrote:
Hello,
Db2 Linux LUW FP 15.
Consider table A (ID varchar, EXECUTION_DATE date).
a) I want to first retrieve all IDs that have not executed during the
last 90 days:
select distinct ID from table_B where NOT EXISTS
(select 1
from table_A
where EXECUTION_DATE between current date - 90 days and current
date AND
table_A.ID = table_B.ID)
b) Now I need the last execution for all IDs that have NOT executed
during the last 90 days:
select TABLE_A.ID, MAX(EXECUTION_DATE) from TABLE_A,
(
select distinct ID from table_B where NOT EXISTS
(select 1
from table_A
where EXECUTION_DATE between current date - 90 days and current
date AND
table_A.ID = table_B.ID)
) as TABLE_C
where TABLE_A.ID = TABLE_C.ID and
EXECUTION_DATE < current date - 90 days
group by TABLE_A.ID
That was easy. The question is ... is there any way to do the same
thing using a better approach, perhaps SINGLE table scan on table_A ?
This table is really big.
TIA, Michel.

What indexes do you have on table_A and table_B, and what access plan
do you currently have?
In addition, what are the number of rows for A and B?

/Lennart

Nov 30 '07 #3

P: n/a
Maybe I'm missing something obvious....:

SELECT ID, MAX(EXECUTION_DATE) FROM TABLEA GROUP BY ID
HAVING MAX(EXECUTION_DATE) < CURRENT DATE - 90 DAYS

Cheers
Serge
--
Serge Rielau
DB2 Solutions Development
IBM Toronto Lab
Dec 1 '07 #4

P: n/a
On 1 dez, 00:17, Serge Rielau <srie...@ca.ibm.comwrote:
Maybe I'm missing something obvious....:

SELECT ID, MAX(EXECUTION_DATE) FROM TABLEA GROUP BY ID
HAVING MAX(EXECUTION_DATE) < CURRENT DATE - 90 DAYS

Cheers
Serge
--
Serge Rielau
DB2 Solutions Development
IBM Toronto Lab
Hi Serge,

In fact, this query does return the last execution prior to current
date - 90 days, but it does not consider that IDs should not execute
between current date - 90 days and current date

The above query returns IDs that have and have not executed between
current date - 90 days and current date.

The query I have posted works fine. I am looking for a way to tune
it ... Is there any way to produce the same result set with a single
table scan ?

TIA, Michel.
Dec 2 '07 #5

P: n/a
On 30 nov, 20:00, Lennart <Erik.Lennart.Jons...@gmail.comwrote:
On Nov 30, 10:59 pm, Lennart <Erik.Lennart.Jons...@gmail.comwrote:


On Nov 30, 6:33 pm, Michel Esber <mic...@us.automatos.comwrote:
Hello,
Db2 Linux LUW FP 15.
Consider table A (ID varchar, EXECUTION_DATE date).
a) I want to first retrieve all IDs that have not executed during the
last 90 days:
select distinct ID from table_B where NOT EXISTS
(select 1
from table_A
where EXECUTION_DATE between current date - 90 days and current
date AND
table_A.ID = table_B.ID)
b) Now I need the last execution for all IDs that have NOT executed
during the last 90 days:
select TABLE_A.ID, MAX(EXECUTION_DATE) from TABLE_A,
(
select distinct ID from table_B where NOT EXISTS
(select 1
from table_A
where EXECUTION_DATE between current date - 90 days and current
date AND
table_A.ID = table_B.ID)
) as TABLE_C
where TABLE_A.ID = TABLE_C.ID and
EXECUTION_DATE < current date - 90 days
group by TABLE_A.ID
That was easy. The question is ... is there any way to do the same
thing using a better approach, perhaps SINGLE table scan on table_A ?
This table is really big.
TIA, Michel.
What indexes do you have on table_A and table_B, and what access plan
do you currently have?

In addition, what are the number of rows for A and B?

/Lennart

Original Statement:
------------------

select R.MACHINE_ID, R.SW_ID, MAX(MONITOR_DATE)

from ASSET.T_METERING R inner join

(
select distinct I.MACHINE_ID, I.SW_ID
from ASSET.TBL_ASSET_SW I
where NOT EXISTS
(select 1
from ASSET.T_METERING M
where MONITOR_DATE between current date - 90 days and current
date and
M. MACHINE_ID = I.MACHINE_ID and
M.SW_ID = I.SW_ID
fetch first 1 rows only)
) as C

on (C.MACHINE_ID = R.MACHINE_ID and C.SW_ID = R.SW_ID)

where MONITOR_DATE < current date - 90 days

group by R.MACHINE_ID, R.SW_ID


Access Plan:
-----------
Total Cost: 3507.52
Query Degree: 1

Rows
RETURN
( 1)
Cost
I/O
|
1.60851e-05
GRPBY
( 2)
3507.52
325.697
|
1.60851e-05
TBSCAN
( 3)
3507.52
325.697
|
0.00401062
SORT
( 4)
3507.51
325.697
|
0.00401062
NLJOIN
( 5)
3507.51
325.697
/---+---\
249.338 0.0172951
TBSCAN IXSCAN
( 6) ( 13)
2514.15 25.0315
196.182 1
| |
249.338 18785
SORT INDEX: ASSET
( 7) MTRIDX03
2514.12
196.182
|
249.338
TBSCAN
( 8)
2513.97
196.182
|
249.338
SORT
( 9)
2513.94
196.182
|
249.338
HSJOIN
( 10)
2513.78
196.182
/------+------\
3500.53 4325
IXSCAN IXSCAN
( 11) ( 12)
1034.09 1478.12
137.182 59
| |
18785 4325
INDEX: ASSET INDEX: SYSIBM
MTRIDX03 SQL061207070618270
Indexes:

Table = T_METERING, index = ASSET.MTRIDX03 Columns = +SW_ID
+MONITOR_DATE+MACHINE_ID
Table = TBL_ASSET_SW, index = SYSIBM.SQL061207070618270 Columns =
+MACHINE_ID+SW_ID

T_METERING has 20M rows, TBL_ASSET_SW has 5M rows.

Thanks, Michel.
Dec 2 '07 #6

P: n/a
On Nov 30, 9:33 am, Michel Esber <mic...@us.automatos.comwrote:
Hello,

Db2 Linux LUW FP 15.

Consider table A (ID varchar, EXECUTION_DATE date).

a) I want to first retrieve all IDs that have not executed during the
last 90 days:

select distinct ID from table_B where NOT EXISTS
(select 1
from table_A
where EXECUTION_DATE between current date - 90 days and current
date AND
table_A.ID = table_B.ID)

b) Now I need the last execution for all IDs that have NOT executed
during the last 90 days:

select TABLE_A.ID, MAX(EXECUTION_DATE) from TABLE_A,

(
select distinct ID from table_B where NOT EXISTS
(select 1
from table_A
where EXECUTION_DATE between current date - 90 days and current
date AND
table_A.ID = table_B.ID)
) as TABLE_C

where TABLE_A.ID = TABLE_C.ID and
EXECUTION_DATE < current date - 90 days

group by TABLE_A.ID

That was easy. The question is ... is there any way to do the same
thing using a better approach, perhaps SINGLE table scan on table_A ?
This table is really big.
I read the rest of the thread. I'll see what I can do with this.

First, let's restate the problem: you want the set of all IDs that
have not executed in the last 90 days, and you want the date of their
last execution.

-- id, and the last execution date of all ids that join to the inner
query
SELECT a.id,
max(execution_date)
FROM table_a a
JOIN (
-- the set of all ids that have not executed in the last 90
days
SELECT id
FROM table_a
GROUP BY id
HAVING MAX(execution_date) < CURRENT DATE - 90 DAYS
) lt90
ON a.id = lt90.id

In your last post on this, it looks like you're interested in SW_ID
and MACHINE_ID, and there is an index that contains those columns plus
the date you're interested in, so I would expect this query to be
reasonably efficient.
Dec 2 '07 #7

P: n/a


Michel Esber wrote:
On 1 dez, 00:17, Serge Rielau <srie...@ca.ibm.comwrote:
>Maybe I'm missing something obvious....:

SELECT ID, MAX(EXECUTION_DATE) FROM TABLEA GROUP BY ID
HAVING MAX(EXECUTION_DATE) < CURRENT DATE - 90 DAYS

Cheers
Serge
--
Serge Rielau
DB2 Solutions Development
IBM Toronto Lab

Hi Serge,

In fact, this query does return the last execution prior to current
date - 90 days, but it does not consider that IDs should not execute
between current date - 90 days and current date

The above query returns IDs that have and have not executed between
current date - 90 days and current date.

The query I have posted works fine. I am looking for a way to tune
it ... Is there any way to produce the same result set with a single
table scan ?

TIA, Michel.
Serge's query works properly on my LUW FP14 system. I'd recommend trying
a small test case (a sample table containing three rows) to verify that
this is failing on your system. If it's failing, then it needs to be
reported. I strongly doubt that FP15 will break this simple SQL statement.

[psherman@T40 ~]$ db2 "create table t1 (id char(2) not null,
execution_date date not null with default)"
DB20000I The SQL command completed successfully.
[psherman@T40 ~]$ db2 "insert into t1 values ('A','2007-12-01')"
DB20000I The SQL command completed successfully.
[psherman@T40 ~]$ db2 "insert into t1 values ('A','2006-12-01')"
DB20000I The SQL command completed successfully.
[psherman@T40 ~]$ db2 "insert into t1 values ('B','2006-12-01')"
DB20000I The SQL command completed successfully.
[psherman@T40 ~]$ db2 "select id, max(execution_date) from t1 group by id"
ID 2
-- ----------
A 12/01/2007
B 12/01/2006

2 record(s) selected.

[psherman@T40 ~]$ db2 "select id, max(execution_date) from t1 group by
id having max(execution_date) < current_date - 90 days"

ID 2
-- ----------
B 12/01/2006

1 record(s) selected.
Phil Sherman
Dec 2 '07 #8

P: n/a
On 2 dez, 13:01, Phil Sherman <psher...@ameritech.netwrote:
Michel Esber wrote:
On 1 dez, 00:17, Serge Rielau <srie...@ca.ibm.comwrote:
Maybe I'm missing something obvious....:
SELECT ID, MAX(EXECUTION_DATE) FROM TABLEA GROUP BY ID
HAVING MAX(EXECUTION_DATE) < CURRENT DATE - 90 DAYS
Cheers
Serge
--
Serge Rielau
DB2 Solutions Development
IBM Toronto Lab
Hi Serge,
In fact, this query does return the last execution prior to current
date - 90 days, but it does not consider that IDs should not execute
between current date - 90 days and current date
The above query returns IDs that have and have not executed between
current date - 90 days and current date.
The query I have posted works fine. I am looking for a way to tune
it ... Is there any way to produce the same result set with a single
table scan ?
TIA, Michel.

Serge's query works properly on my LUW FP14 system. I'd recommend trying
a small test case (a sample table containing three rows) to verify that
this is failing on your system. If it's failing, then it needs to be
reported. I strongly doubt that FP15 will break this simple SQL statement.

[psherman@T40 ~]$ db2 "create table t1 (id char(2) not null,
execution_date date not null with default)"
DB20000I The SQL command completed successfully.
[psherman@T40 ~]$ db2 "insert into t1 values ('A','2007-12-01')"
DB20000I The SQL command completed successfully.
[psherman@T40 ~]$ db2 "insert into t1 values ('A','2006-12-01')"
DB20000I The SQL command completed successfully.
[psherman@T40 ~]$ db2 "insert into t1 values ('B','2006-12-01')"
DB20000I The SQL command completed successfully.
[psherman@T40 ~]$ db2 "select id, max(execution_date) from t1 group by id"
ID 2
-- ----------
A 12/01/2007
B 12/01/2006

2 record(s) selected.

[psherman@T40 ~]$ db2 "select id, max(execution_date) from t1 group by
id having max(execution_date) < current_date - 90 days"

ID 2
-- ----------
B 12/01/2006

1 record(s) selected.

Phil Sherman-
Phil, Serge and Dean, thank you for your time. Let me try to explain
why this is incorrect:

Consider the following table data:

db2 "select MONITOR_DATE from ASSET.T_METERING where
MACHINE_ID='000B6A10620ED121C9481542' and SW_ID=77"

MONITOR_DATE
------------
10/17/2007
10/18/2007
11/30/2007

If I run my original query:

db2 "select MACHINE_ID, SW_ID from ASSET.TBL_ASSET_SW S where not
exists (select 1 from ASSET.T_METERING M where
M.MACHINE_ID=S.MACHINE_ID and S.SW_ID=M.SW_ID and MONITOR_DATE between
current date - 30 days and current date) and
MACHINE_ID='000B6A10620ED121C9481542' and SW_ID=77"

I get no results. That´s abovius, because there was an execution on
11/30/2007. Note that I do need to join with TBL_ASSET_SW for other
purposes.

Now, executing the queries you have suggested:

db2 "select MACHINE_ID,SW_ID from ASSET.T_METERING where
MACHINE_ID='000B6A10620ED121C9481542' group by MACHINE_ID, SW_ID
having MAX(MONITOR_DATE) < current date - 90 days"

SW_ID 77 is not returned here, which is nice.

Next:

db2 "select S.MACHINE_ID, S.SW_ID
from ASSET.TBL_ASSET_SW S
where NOT EXISTS (select MACHINE_ID, SW_ID
from ASSET.T_METERING M
where M.MACHINE_ID='000B6A10620ED121C9481542'
and
S.MACHINE_ID = M.MACHINE_ID and
S.SW_ID = M.SW_ID
group by MACHINE_ID, SW_ID
having MAX(MONITOR_DATE) < current date - 30
days)
and MACHINE_ID='000B6A10620ED121C9481542' and SW_ID=77"

It does return one row:

MACHINE_ID SW_ID
------------------------ ----------
000B6A10620ED121C9481542 77.

I understand what´s happening here, and the result set is not
incorrect (according to the query). But for my scenario, this is just
wrong. I need to certify that the execution did not happen between
current date - X days and current date.

Hope that it makes sense now.

Thanks again.

Dec 2 '07 #9

P: n/a
Michel Esber wrote:
I understand what´s happening here, and the result set is not
incorrect (according to the query).
I do not understand what's happening here.
Hope that it makes sense now.
No it doesn't, you are assuming we have the same local knowledge as you.
1. We are missing the sample data for the asset table
2. You keep flip flopping between 30 days and 90 days which confuses the
matter
3. Shouldn't matter, but: Why are you adding the literal into the NOT
EXISTS with the HAVING?

I propose we restart with a clean slate:
Define both tables, define contents for both. Define the output.

Cheers
Serge
--
Serge Rielau
DB2 Solutions Development
IBM Toronto Lab
Dec 2 '07 #10

P: n/a
On 2 dez, 14:16, Serge Rielau <srie...@ca.ibm.comwrote:
Michel Esber wrote:
I understand what´s happening here, and the result set is not
incorrect (according to the query).

I do not understand what's happening here.
Hope that it makes sense now.

No it doesn't, you are assuming we have the same local knowledge as you.
1. We are missing the sample data for the asset table
2. You keep flip flopping between 30 days and 90 days which confuses the
matter
3. Shouldn't matter, but: Why are you adding the literal into the NOT
EXISTS with the HAVING?

I propose we restart with a clean slate:
Define both tables, define contents for both. Define the output.

Cheers
Serge
--
Serge Rielau
DB2 Solutions Development
IBM Toronto Lab


My apologies. Let me try to make this simple:

TBL_ASSET_SW (MACHINE_ID, SW_ID) values are:
A,1
B,2
C,3

T_METERING (MACHINE_ID, SW_ID, EXECUTION_DATE):

A,1,11/30/2007
A,1,10/15/2007
B,2,10/18/2007
B,2,10/17/2007
In this example, I need to retrieve the maximum execution for all
(MACHINE_ID, SW_ID) that have NOT EXECUTED between 11/25/2007 and
12/02/2007.

The expected output is:

B,2,10/18/2007
C,3,NULL

Even though (C,3) did not have any execution, I need it in the result
set. I was hoping to find a solution that did not scan T_METERING more
than once. I do not know if that is possible.

Thanks, Michel.

Dec 2 '07 #11

P: n/a
Ian
Michel Esber wrote:
On 2 dez, 14:16, Serge Rielau <srie...@ca.ibm.comwrote:
>Michel Esber wrote:
>>I understand what´s happening here, and the result set is not
incorrect (according to the query).
I do not understand what's happening here.
>>Hope that it makes sense now.
No it doesn't, you are assuming we have the same local knowledge as you.
1. We are missing the sample data for the asset table
2. You keep flip flopping between 30 days and 90 days which confuses the
matter
3. Shouldn't matter, but: Why are you adding the literal into the NOT
EXISTS with the HAVING?

I propose we restart with a clean slate:
Define both tables, define contents for both. Define the output.

Cheers
Serge
--
Serge Rielau
DB2 Solutions Development
IBM Toronto Lab

My apologies. Let me try to make this simple:

TBL_ASSET_SW (MACHINE_ID, SW_ID) values are:
A,1
B,2
C,3

T_METERING (MACHINE_ID, SW_ID, EXECUTION_DATE):

A,1,11/30/2007
A,1,10/15/2007
B,2,10/18/2007
B,2,10/17/2007
In this example, I need to retrieve the maximum execution for all
(MACHINE_ID, SW_ID) that have NOT EXECUTED between 11/25/2007 and
12/02/2007.

The expected output is:

B,2,10/18/2007
C,3,NULL

Even though (C,3) did not have any execution, I need it in the result
set. I was hoping to find a solution that did not scan T_METERING more
than once. I do not know if that is possible.
Yes, this is possible. Since your original posts were looking for
data older than a certain age (and the example above asks for data
older than 7 days (since today is 2007-12-02):

select
a.machine_id,
a.sw_id,
max(b.execution_date)
from
tbl_asset_sw a
left outer join t_metering b
on (a.machine_id = b.machine_id and a.sw_id = b.sw_id)
group by
a.machine_id,
a.sw_id
having
max(b.execution_date) is null
or
max(b.execution_date) < current date - 7 days
Good luck,
Dec 3 '07 #12

P: n/a
On 3 dez, 02:05, Ian <ianb...@mobileaudio.comwrote:
Michel Esber wrote:
On 2 dez, 14:16, Serge Rielau <srie...@ca.ibm.comwrote:
Michel Esber wrote:
I understand what´s happening here, and the result set is not
incorrect (according to the query).
I do not understand what's happening here.
>Hope that it makes sense now.
No it doesn't, you are assuming we have the same local knowledge as you..
1. We are missing the sample data for the asset table
2. You keep flip flopping between 30 days and 90 days which confuses the
matter
3. Shouldn't matter, but: Why are you adding the literal into the NOT
EXISTS with the HAVING?
I propose we restart with a clean slate:
Define both tables, define contents for both. Define the output.
Cheers
Serge
--
Serge Rielau
DB2 Solutions Development
IBM Toronto Lab
My apologies. Let me try to make this simple:
TBL_ASSET_SW (MACHINE_ID, SW_ID) values are:
A,1
B,2
C,3
T_METERING (MACHINE_ID, SW_ID, EXECUTION_DATE):
A,1,11/30/2007
A,1,10/15/2007
B,2,10/18/2007
B,2,10/17/2007
In this example, I need to retrieve the maximum execution for all
(MACHINE_ID, SW_ID) that have NOT EXECUTED between 11/25/2007 and
12/02/2007.
The expected output is:
B,2,10/18/2007
C,3,NULL
Even though (C,3) did not have any execution, I need it in the result
set. I was hoping to find a solution that did not scan T_METERING more
than once. I do not know if that is possible.

Yes, this is possible. Since your original posts were looking for
data older than a certain age (and the example above asks for data
older than 7 days (since today is 2007-12-02):

select
a.machine_id,
a.sw_id,
max(b.execution_date)
from
tbl_asset_sw a
left outer join t_metering b
on (a.machine_id = b.machine_id and a.sw_id = b.sw_id)
group by
a.machine_id,
a.sw_id
having
max(b.execution_date) is null
or
max(b.execution_date) < current date - 7 days

Good luck,
Thanks Ian and all that have helped. I now realize my first query was
awful. Your solution was simple and efficient.
Dec 4 '07 #13

This discussion thread is closed

Replies have been disabled for this discussion.