By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
428,997 Members | 1,470 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 428,997 IT Pros & Developers. It's quick & easy.

DPSIs: Materialisation of column values for DPSI use

P: n/a
Hi
I am looking in to utilising the new functionality of DPSIs on zOS UDB
(DB2 v8), but I have a question regarding the following scenario.

- The first table is an Account table whose columns are Account,
Invoice Date, Partition#. This is a NPSI table.
There are about 500,000 accounts which gives me 6MM records (500k * 12
months)

- The second is an Invoice table whose columns are Account, Invoice
Date, Partition#, Invoice. This is a DPSI table partitioned on
Partition# which corresponds to the Invoice Julian Day, and is
clustered within a partition by Account.
There are 365 partitions, and an invoice comes through for a given
account once a month. Therefore for a year for a given account I will
find the invoices within 12 partitions.
A given invoice may take up 20 records, therefore there are 120MM
records on this table (6MM * 20 records)

- Finally I have a Driver table with a list of Accounts I need to pull
off for use.
There might be 100,000 records on here with accounts spanning all the
DPSI partitions for the year.

In a total NPSI world I could have the following query:

SELECT INV.INVOICE
FROM DRVR_TBL AS DRVR
, ACCT_TBL AS ACCT
, INV_TBL AS INV
WHERE ACCT.ACCT_NO = DRVR.ACCT_NO
AND INV.PART_NO = ACCT.PART_NO
AND INV.ACCT_NO = ACCT.ACCT_NO

However, by utilising the utility-based data loading flexibility of the
DPSI tables (ie one partition is offline whilst the rest are online
24x7, and being read), the PART-NO needs to be materialised first
before passing into a query otherwise DB2 goes against all the
partitions in order to index search for the account number.

Therefore the options I can see are :

OPTION#1: Open two CURSORs in my COBOL program thus:
CURSOR#1
SELECT ACCT.PART_NO, ACCT.ACCT_NO
FROM DRVR_TBL AS DRVR
, ACCT_TBL AS ACCT
WHERE ACCT.ACCT_NO = DRVR.ACCT_NO

CURSOR#2
SELECT INV.INVOICE
FROM INV_TBL AS INV
WHERE INV.PART_NO = :part-no
AND INV.ACCT_NO = :acct-no
OPTION#2: A rework of the following query which can be done on UDB for
AIX which is effecively a non-correlated sub-query but using two
columns instead of one.
SELECT INV.INVOICE
FROM INV_TBL AS INV
WHERE (INV.PART_NO, INV.ACCT_NO) IN (SELECT ACCT.PART_NO,
ACCT.ACCT_NO
FROM DRVR_TBL AS DRVR
, ACCT_TBL AS ACCT
WHERE ACCT.ACCT_NO =
DRVR.ACCT_NO)
OPTION#3: The rework could be a concatonation of the two columns

- The problem I have with #1 is that I have to OPEN, FETCH and CLOSE my
CURSOR#2 for all the 100k accounts I am reading from my CURSOR#1 on my
Driver table, thereby giving me an overhead ... am I worrying to much
about this overhead? is it really that much?
- The problem I have with #2 is that it can't be done on zOS UDB. Also
I'm worried that wouldn't DB2 have to materialise the whole resultset
before passing back thereby giving me possible temp-space issues, or
does it give me the results in part-no¦¦acct-no chunks?
- The problem with #3 is would the optimiser end up using any of my
column based indexes?

Any ideas and solutions would be most appreciated
Thanks in advance
Simon

Nov 17 '06 #1
Share this Question
Share on Google+
1 Reply


P: n/a
Your description indicates that you start with a list of accounts and
want to retrieve all of the invoices for the year for each of the accounts.

If you use your option #2, you should retrieve all 12 rows of account
information first, then use an IN clause to retrieve all of the invoices
for that account. This should limit you to 12 of the invoice partitions
and the indexes should take you directly to the rows. Plan on having 12
values for the IN clause, fill all of them with zero then overlay the
values as you read the account rows. This will allow you to use static
SQL when there are less than 12 invoices in a year.

As an interesting aside - what do you do in leap years? Those have 366
days in them.

Second aside - why are you processing 20% (100,000 of 500,000) of the
accounts every night? Who looks at all that data? Who has the time to
set up the driver table data every day? If you bill each account once a
month, then less than 25k accounts change each day. If you have 100k
records and each represents an invoice for an account over a year's
time, then you have only about 8300 accounts being processed. Using the
IN clause above will dramatically cut the number of SQL statements
issued against the invoice table.
Phil Sherman

Sigmazen wrote:
Hi
I am looking in to utilising the new functionality of DPSIs on zOS UDB
(DB2 v8), but I have a question regarding the following scenario.

- The first table is an Account table whose columns are Account,
Invoice Date, Partition#. This is a NPSI table.
There are about 500,000 accounts which gives me 6MM records (500k * 12
months)

- The second is an Invoice table whose columns are Account, Invoice
Date, Partition#, Invoice. This is a DPSI table partitioned on
Partition# which corresponds to the Invoice Julian Day, and is
clustered within a partition by Account.
There are 365 partitions, and an invoice comes through for a given
account once a month. Therefore for a year for a given account I will
find the invoices within 12 partitions.
A given invoice may take up 20 records, therefore there are 120MM
records on this table (6MM * 20 records)

- Finally I have a Driver table with a list of Accounts I need to pull
off for use.
There might be 100,000 records on here with accounts spanning all the
DPSI partitions for the year.

In a total NPSI world I could have the following query:

SELECT INV.INVOICE
FROM DRVR_TBL AS DRVR
, ACCT_TBL AS ACCT
, INV_TBL AS INV
WHERE ACCT.ACCT_NO = DRVR.ACCT_NO
AND INV.PART_NO = ACCT.PART_NO
AND INV.ACCT_NO = ACCT.ACCT_NO

However, by utilising the utility-based data loading flexibility of the
DPSI tables (ie one partition is offline whilst the rest are online
24x7, and being read), the PART-NO needs to be materialised first
before passing into a query otherwise DB2 goes against all the
partitions in order to index search for the account number.

Therefore the options I can see are :

OPTION#1: Open two CURSORs in my COBOL program thus:
CURSOR#1
SELECT ACCT.PART_NO, ACCT.ACCT_NO
FROM DRVR_TBL AS DRVR
, ACCT_TBL AS ACCT
WHERE ACCT.ACCT_NO = DRVR.ACCT_NO

CURSOR#2
SELECT INV.INVOICE
FROM INV_TBL AS INV
WHERE INV.PART_NO = :part-no
AND INV.ACCT_NO = :acct-no
OPTION#2: A rework of the following query which can be done on UDB for
AIX which is effecively a non-correlated sub-query but using two
columns instead of one.
SELECT INV.INVOICE
FROM INV_TBL AS INV
WHERE (INV.PART_NO, INV.ACCT_NO) IN (SELECT ACCT.PART_NO,
ACCT.ACCT_NO
FROM DRVR_TBL AS DRVR
, ACCT_TBL AS ACCT
WHERE ACCT.ACCT_NO =
DRVR.ACCT_NO)
OPTION#3: The rework could be a concatonation of the two columns

- The problem I have with #1 is that I have to OPEN, FETCH and CLOSE my
CURSOR#2 for all the 100k accounts I am reading from my CURSOR#1 on my
Driver table, thereby giving me an overhead ... am I worrying to much
about this overhead? is it really that much?
- The problem I have with #2 is that it can't be done on zOS UDB. Also
I'm worried that wouldn't DB2 have to materialise the whole resultset
before passing back thereby giving me possible temp-space issues, or
does it give me the results in part-no¦¦acct-no chunks?
- The problem with #3 is would the optimiser end up using any of my
column based indexes?

Any ideas and solutions would be most appreciated
Thanks in advance
Simon
Nov 17 '06 #2

This discussion thread is closed

Replies have been disabled for this discussion.