rows read vs. rows selected. | | |
Hi All:
I have a query which is running against large table. The query:
SELECT DSRC_ACCT_ID, ADDR1, ADDR2, ADDR3, CITY, STATE, POSTAL_CODE,
COUNT(*) FROM ERD.ADDRESSA GROUP BY DSRC_ACCT_ID, ADDR1, ADDR2, ADDR3,
CITY, STATE, POSTAL_CODE HAVING COUNT(*) > 2
The applcation snapshot monitor shows:
Rows selected = 181
Rows read = 646485498
Rows written = 933743357
The rows read is too high compared to rows selected.
(The table has 933743360 rows)
My questions is: What's the best way to increase selectivity of this
query?
Thanks in Advance.
Vijay
The plan is using the correct index:
RETURN
( 1)
|
FILTER
( 2)
|
GRPBY
( 3)
|
TBSCAN
( 4)
|
SORT
( 5)
|
IXSCAN
( 6)
/ \
Index: Table:
ERD ERD
ADDR_QRYA ADDRESSA | | | | re: rows read vs. rows selected.
"UDBDBA" <vijay.sitaram@gmail.com> wrote in message
news:1128113621.451961.116170@g49g2000cwa.googlegr oups.com...[color=blue]
> Hi All:
>
> I have a query which is running against large table. The query:
>
> SELECT DSRC_ACCT_ID, ADDR1, ADDR2, ADDR3, CITY, STATE, POSTAL_CODE,
> COUNT(*) FROM ERD.ADDRESSA GROUP BY DSRC_ACCT_ID, ADDR1, ADDR2, ADDR3,
> CITY, STATE, POSTAL_CODE HAVING COUNT(*) > 2
>
> The applcation snapshot monitor shows:
> Rows selected = 181
> Rows read = 646 485 498
> Rows written = 933 743 357
>
>
> The rows read is too high compared to rows selected.
> (The table has 933 743 360 rows)
>
> My questions is: What's the best way to increase selectivity of this
> query?
>
>
> Thanks in Advance.
>
> Vijay
>[/color]
If you want to increase selectivity, you need to use a WHERE clause before
the group by. If you want to evaluate every row, then you cannot do that.
I don't think that the snapshot monitor data you are looking at is telling
you what you think it is.
The best way to improve the performance on a table scan on a very large
table is to use multiple containers (on separate physical devices) and to
enable intra-partition parallelism. Make sure your prefetch size is n times
your extent size of the tablespace (where n is the number of containers).
You could also consider splitting the data into multiple tables and using a
UNION ALL view, which would also enable parallelism. The degree of
parallelism you want depends on the number of CPUs and separate physical
disk devices you have.
You should also look sort heaps, temporary tablespaces, and make sure they
are large enough.
If your application does mostly table scans (data warehouse application),
increase your tablespace page size to 16K or 32K (this will hurt your OLTP
transactions if you have any). | | | | re: rows read vs. rows selected.
UDBDBA wrote:[color=blue]
> Hi All:
>
> I have a query which is running against large table. The query:
>
> SELECT DSRC_ACCT_ID, ADDR1, ADDR2, ADDR3, CITY, STATE, POSTAL_CODE,
> COUNT(*) FROM ERD.ADDRESSA GROUP BY DSRC_ACCT_ID, ADDR1, ADDR2, ADDR3,
> CITY, STATE, POSTAL_CODE HAVING COUNT(*) > 2
>
> The applcation snapshot monitor shows:
> Rows selected = 181
> Rows read = 646485498
> Rows written = 933743357
>
>
> The rows read is too high compared to rows selected.
> (The table has 933743360 rows)
>
> My questions is: What's the best way to increase selectivity of this
> query?
>
>
> Thanks in Advance.
>
> Vijay
>
>
> The plan is using the correct index:
>
> RETURN
> ( 1)
> |
> FILTER
> ( 2)
> |
> GRPBY
> ( 3)
> |
> TBSCAN
> ( 4)
> |
> SORT
> ( 5)
> |
> IXSCAN
> ( 6)
> / \
> Index: Table:
> ERD ERD
> ADDR_QRYA ADDRESSA
>[/color]
What's the ADDR_QRYA index defined on?
Do you ahve an index on your group by columns (in that order!):
(DSRC_ACCT_ID, ADDR1, ADDR2, ADDR3, CITY, STATE, POSTAL_CODE)
Reason why I'm asking is that the SORT should go (do you have sort
overflows? (rows written...??).
A totally different approach could be to employ an MQT.
Cheers
Serge
--
Serge Rielau
DB2 SQL Compiler Development
IBM Toronto Lab | | | | re: rows read vs. rows selected.
The index definition:
create UNIQUE index addr_qrya on erd.addressa (DSRC_ACCT_ID,ADDR_ID)
INCLUDE (ADDR1,ADDR2, ADDR3, CITY, STATE, POSTAL_CODE) ALLOW REVERSE
SCANS;
The access graph shows an index only access, but the SORT and TBSCAN in
access graph .. can you explain it?
We evaluated MQT vs. Adding an extra index with INCLUDE columns. Seems
like this index could be used by other queries and saves us the
overhead of double the operation (update,insert.logging,locking...) by
having MQT.
Thanks!
Vijay |  | Similar DB2 Database bytes | | | /bytes/about
We are a network of experts and professionals in IT and software development that help one another with answers to tough questions and share insights.
Get the best answers to your questions from over 226,449 network members.
|