By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
432,548 Members | 1,735 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 432,548 IT Pros & Developers. It's quick & easy.

Improve query performance on large table

P: n/a
Hi Folks,

I have a very large table containing 170 million rows of coordinats:

CREATE TABLE "DB2ADMIN"."COORDINATE" (
"FID" INTEGER NOT NULL ,
"AID" INTEGER NOT NULL ,
"X" REAL NOT NULL ,
"Y" REAL NOT NULL ,
"Z" REAL NOT NULL )
IN "USERSPACE1" ;

ALTER TABLE "DB2ADMIN"."COORDINATE" APPEND ON;
CREATE INDEX "DB2ADMIN"."IDX_COORDINATE1" ON "DB2ADMIN"."COORDINATE"
("FID" ASC) PCTFREE 0 ALLOW REVERSE SCANS;
CREATE INDEX "DB2ADMIN"."IDX_COORDINATE2" ON "DB2ADMIN"."COORDINATE"
("AID" ASC) PCTFREE 0 ALLOW REVERSE SCANS;
ALTER TABLE "DB2ADMIN"."COORDINATE" ADD CONSTRAINT
"PK_COORDINATE8" PRIMARY KEY ("FID", "AID");

When I issue:

SELECT X,Y,Z FROM COORDINATE WHERE FID<1000;

it takes more than 10 minutes to finish. The system runs on a Sun Cobalt box (dual-PIII 1.4G, 1GB mem). I have tuned the Buffer Pool to 500MB. Is there any other way to improve the performance?

A strange thing is, when I join the above SQL with another table, it reduces the time:

SELECT C.X,C.Y,C.Z FROM COORDINATE AS C, FRAME AS F
WHERE C.FID=F.ID AND F.ID<1000

It takes only 7 minutes. FID is a foreign key of table 'FRAME'.

Please advice.

Thanks,

Bing

Nov 12 '05 #1
Share this Question
Share on Google+
7 Replies


P: n/a

"Bing Wu" <bi**@biop.ox.ac.uk> wrote in message
news:bo**********@news.ox.ac.uk...
Hi Folks,

I have a very large table containing 170 million rows of coordinats:

CREATE TABLE "DB2ADMIN"."COORDINATE" (
"FID" INTEGER NOT NULL ,
"AID" INTEGER NOT NULL ,
"X" REAL NOT NULL ,
"Y" REAL NOT NULL ,
"Z" REAL NOT NULL )
IN "USERSPACE1" ;

ALTER TABLE "DB2ADMIN"."COORDINATE" APPEND ON;
CREATE INDEX "DB2ADMIN"."IDX_COORDINATE1" ON "DB2ADMIN"."COORDINATE"
("FID" ASC) PCTFREE 0 ALLOW REVERSE SCANS;
CREATE INDEX "DB2ADMIN"."IDX_COORDINATE2" ON "DB2ADMIN"."COORDINATE"
("AID" ASC) PCTFREE 0 ALLOW REVERSE SCANS;
ALTER TABLE "DB2ADMIN"."COORDINATE" ADD CONSTRAINT
"PK_COORDINATE8" PRIMARY KEY ("FID", "AID");

When I issue:

SELECT X,Y,Z FROM COORDINATE WHERE FID<1000;

it takes more than 10 minutes to finish. The system runs on a Sun Cobalt box (dual-PIII 1.4G, 1GB mem). I have tuned the Buffer Pool to 500MB. Is
there any other way to improve the performance?
A strange thing is, when I join the above SQL with another table, it reduces the time:
SELECT C.X,C.Y,C.Z FROM COORDINATE AS C, FRAME AS F
WHERE C.FID=F.ID AND F.ID<1000

It takes only 7 minutes. FID is a foreign key of table 'FRAME'.

Please advice.

Thanks,

Bing


Did you perform runstats utility with full statistics on the table and all
indexes?
RUNSTATS ON TABLE table-name DETAILED INDEXES ALL WITH DISTRIBUTION ON ALL
COLUMNS

What percent of the table is returned with this query?

Did you do a visual explain on this query? If so, is the index used?

If you only tested the first query once, it could be because the buffer
pools are loaded for the second query.
Nov 12 '05 #2

P: n/a
Hi Bing,

I guess you did not
RUNSTATS ON TABLE table-name WITH DISTRIBUTION AND DETAILED INDEXES ALL
after creating the index and filling the table with data?

With no distribution statistics <1000 would be estimated to roughly 50% of
the rows (85 million) which would not qualify for using an index.
If you expect a relatively small result set (e.g. 1000 fid's) this will
help.

Joachim Banzhaf

Bing Wu wrote:
Hi Folks,

I have a very large table containing 170 million rows of coordinats:

CREATE TABLE "DB2ADMIN"."COORDINATE" (
"FID" INTEGER NOT NULL ,
"AID" INTEGER NOT NULL ,
"X" REAL NOT NULL ,
"Y" REAL NOT NULL ,
"Z" REAL NOT NULL )
IN "USERSPACE1" ;

ALTER TABLE "DB2ADMIN"."COORDINATE" APPEND ON;
CREATE INDEX "DB2ADMIN"."IDX_COORDINATE1" ON "DB2ADMIN"."COORDINATE"
("FID" ASC) PCTFREE 0 ALLOW REVERSE SCANS;
CREATE INDEX "DB2ADMIN"."IDX_COORDINATE2" ON "DB2ADMIN"."COORDINATE"
("AID" ASC) PCTFREE 0 ALLOW REVERSE SCANS;
ALTER TABLE "DB2ADMIN"."COORDINATE" ADD CONSTRAINT
"PK_COORDINATE8" PRIMARY KEY ("FID", "AID");

When I issue:

SELECT X,Y,Z FROM COORDINATE WHERE FID<1000;

it takes more than 10 minutes to finish. The system runs on a Sun Cobalt
box (dual-PIII 1.4G, 1GB mem). I have tuned the Buffer Pool to 500MB. Is
there any other way to improve the performance?

A strange thing is, when I join the above SQL with another table, it
reduces the time:

SELECT C.X,C.Y,C.Z FROM COORDINATE AS C, FRAME AS F
WHERE C.FID=F.ID AND F.ID<1000

It takes only 7 minutes. FID is a foreign key of table 'FRAME'.

Please advice.

Thanks,

Bing


Nov 12 '05 #3

P: n/a
Thanks,

The query returns 5% of rows. I have the result from dynexpln:

SQL Statement:
DECLARE C1 CURSOR
FOR
select c.x, c.y, c.z
from coordinate as c, frame as f
where f.tid=1 and c.fid=f.id and f.id<1000
Section Code Page = 819

Estimated Cost = 4862935.000000
Estimated Cardinality = 3394883.500000

Access Table Name = DB2ADMIN.FRAME ID = 2,38
| #Columns = 0
| Index Scan: Name = DB2ADMIN.PK_FRAME3 ID = 5
| | Regular Index (Not Clustered)
| | Index Columns:
| | | 1: ID (Ascending)
| | #Key Columns = 1
| | | Start Key: Beginning of Index
| | | Stop Key: Exclusive Value
| | | | 1: 1000
| | Index-Only Access
| | Index Prefetch: None
| | Insert Into Sorted Temp Table ID = t1
| | | #Columns = 1
| | | #Sort Key Columns = 1
| | | | Key 1: (Ascending)
| | | Sortheap Allocation Parameters:
| | | | #Rows = 804
| | | | Row Width = 12
| | | Piped
| | | Duplicate Elimination
| Isolation Level: Uncommitted Read
| Lock Intents
| | Table: Intent None
| | Row : None
Sorted Temp Table Completion ID = t1
List Prefetch Preparation
| Access Table Name = DB2ADMIN.FRAME ID = 2,38
| | #Columns = 1
| | Relation Scan
| | | Prefetch: 50 Pages
| | Lock Intents
| | | Table: Intent Share
| | | Row : Next Key Share
| | Sargable Predicate(s)
| | | #Predicates = 2
| | Process Build Table for Hash Join
Hash Join
| Early Out: Single Match Per Outer Row
| Estimated Build Size: 12093
| Estimated Probe Size: 220287872
| Access Table Name = DB2ADMIN.COORDINATE ID = 2,6
| | #Columns = 0
| | Index Scan: Name = DB2ADMIN.IDX_COORDINATE1 ID = 2
| | | Regular Index (Not Clustered)
| | | Index Columns:
| | | | 1: FID (Ascending)
| | | #Key Columns = 1
| | | | Start Key: Beginning of Index
| | | | Stop Key: Exclusive Value
| | | | | 1: 1000
| | | Index-Only Access
| | | Index Prefetch: Eligible 31867
| | | Insert Into Sorted Temp Table ID = t2
| | | | #Columns = 1
| | | | #Sort Key Columns = 1
| | | | | Key 1: (Ascending)
| | | | Sortheap Allocation Parameters:
| | | | | #Rows = 6835684
| | | | | Row Width = 12
| | | | Piped
| | | | Duplicate Elimination
| | Isolation Level: Uncommitted Read
| | Lock Intents
| | | Table: Intent None
| | | Row : None
| Sorted Temp Table Completion ID = t2
| List Prefetch Preparation
| | Access Table Name = DB2ADMIN.COORDINATE ID = 2,6
| | | #Columns = 4
| | | Relation Scan
| | | | Prefetch: 1029488 Pages
| | | Lock Intents
| | | | Table: Intent Share
| | | | Row : Next Key Share
| | | Sargable Predicate(s)
| | | | #Predicates = 1
| | | Process Probe Table for Hash Join
Return Data to Application
| #Columns = 3
I did test two queries several times and it still truns out query2 3 mins ahead of query1. Any thoughts?

Cheers,

Bing
Mark A wrote:
"Bing Wu" <bi**@biop.ox.ac.uk> wrote in message
news:bo**********@news.ox.ac.uk...
Hi Folks,

I have a very large table containing 170 million rows of coordinats:

CREATE TABLE "DB2ADMIN"."COORDINATE" (
"FID" INTEGER NOT NULL ,
"AID" INTEGER NOT NULL ,
"X" REAL NOT NULL ,
"Y" REAL NOT NULL ,
"Z" REAL NOT NULL )
IN "USERSPACE1" ;

ALTER TABLE "DB2ADMIN"."COORDINATE" APPEND ON;
CREATE INDEX "DB2ADMIN"."IDX_COORDINATE1" ON "DB2ADMIN"."COORDINATE"
("FID" ASC) PCTFREE 0 ALLOW REVERSE SCANS;
CREATE INDEX "DB2ADMIN"."IDX_COORDINATE2" ON "DB2ADMIN"."COORDINATE"
("AID" ASC) PCTFREE 0 ALLOW REVERSE SCANS;
ALTER TABLE "DB2ADMIN"."COORDINATE" ADD CONSTRAINT
"PK_COORDINATE8" PRIMARY KEY ("FID", "AID");

When I issue:

SELECT X,Y,Z FROM COORDINATE WHERE FID<1000;

it takes more than 10 minutes to finish. The system runs on a Sun Cobalt


box (dual-PIII 1.4G, 1GB mem). I have tuned the Buffer Pool to 500MB. Is
there any other way to improve the performance?
A strange thing is, when I join the above SQL with another table, it


reduces the time:
SELECT C.X,C.Y,C.Z FROM COORDINATE AS C, FRAME AS F
WHERE C.FID=F.ID AND F.ID<1000

It takes only 7 minutes. FID is a foreign key of table 'FRAME'.

Please advice.

Thanks,

Bing

Did you perform runstats utility with full statistics on the table and all
indexes?
RUNSTATS ON TABLE table-name DETAILED INDEXES ALL WITH DISTRIBUTION ON ALL
COLUMNS

What percent of the table is returned with this query?

Did you do a visual explain on this query? If so, is the index used?

If you only tested the first query once, it could be because the buffer
pools are loaded for the second query.


Nov 12 '05 #4

P: n/a

"Bing Wu" <bi**@biop.ox.ac.uk> wrote in message
news:3F**************@biop.ox.ac.uk...
Thanks,

The query returns 5% of rows. I have the result from dynexpln:

SQL Statement:
DECLARE C1 CURSOR
FOR
select c.x, c.y, c.z
from coordinate as c, frame as f
where f.tid=1 and c.fid=f.id and f.id<1000
Section Code Page = 819

Estimated Cost = 4862935.000000
Estimated Cardinality = 3394883.500000

Access Table Name = DB2ADMIN.FRAME ID = 2,38
| #Columns = 0
| Index Scan: Name = DB2ADMIN.PK_FRAME3 ID = 5
| | Regular Index (Not Clustered)
| | Index Columns:
| | | 1: ID (Ascending)
| | #Key Columns = 1
| | | Start Key: Beginning of Index
| | | Stop Key: Exclusive Value
| | | | 1: 1000
| | Index-Only Access
| | Index Prefetch: None
| | Insert Into Sorted Temp Table ID = t1
| | | #Columns = 1
| | | #Sort Key Columns = 1
| | | | Key 1: (Ascending)
| | | Sortheap Allocation Parameters:
| | | | #Rows = 804
| | | | Row Width = 12
| | | Piped
| | | Duplicate Elimination
| Isolation Level: Uncommitted Read
| Lock Intents
| | Table: Intent None
| | Row : None
Sorted Temp Table Completion ID = t1
List Prefetch Preparation
| Access Table Name = DB2ADMIN.FRAME ID = 2,38
| | #Columns = 1
| | Relation Scan
| | | Prefetch: 50 Pages
| | Lock Intents
| | | Table: Intent Share
| | | Row : Next Key Share
| | Sargable Predicate(s)
| | | #Predicates = 2
| | Process Build Table for Hash Join
Hash Join
| Early Out: Single Match Per Outer Row
| Estimated Build Size: 12093
| Estimated Probe Size: 220287872
| Access Table Name = DB2ADMIN.COORDINATE ID = 2,6
| | #Columns = 0
| | Index Scan: Name = DB2ADMIN.IDX_COORDINATE1 ID = 2
| | | Regular Index (Not Clustered)
| | | Index Columns:
| | | | 1: FID (Ascending)
| | | #Key Columns = 1
| | | | Start Key: Beginning of Index
| | | | Stop Key: Exclusive Value
| | | | | 1: 1000
| | | Index-Only Access
| | | Index Prefetch: Eligible 31867
| | | Insert Into Sorted Temp Table ID = t2
| | | | #Columns = 1
| | | | #Sort Key Columns = 1
| | | | | Key 1: (Ascending)
| | | | Sortheap Allocation Parameters:
| | | | | #Rows = 6835684
| | | | | Row Width = 12
| | | | Piped
| | | | Duplicate Elimination
| | Isolation Level: Uncommitted Read
| | Lock Intents
| | | Table: Intent None
| | | Row : None
| Sorted Temp Table Completion ID = t2
| List Prefetch Preparation
| | Access Table Name = DB2ADMIN.COORDINATE ID = 2,6
| | | #Columns = 4
| | | Relation Scan
| | | | Prefetch: 1029488 Pages
| | | Lock Intents
| | | | Table: Intent Share
| | | | Row : Next Key Share
| | | Sargable Predicate(s)
| | | | #Predicates = 1
| | | Process Probe Table for Hash Join
Return Data to Application
| #Columns = 3
I did test two queries several times and it still truns out query2 3 mins ahead of query1. Any thoughts?
Cheers,

Bing

If you did the runstats requested with detail statistics (you did not say
above), and this is static SQL, then you need to rebind your package.

You did not show the access plan of query 1, but it may be that it just
takes that long to return the 200,000 rows (5%).

However I would use the Performance Wizard in the Control Center to tune the
database parameters, such as sort heaps, etc.
Nov 12 '05 #5

P: n/a

Mark A wrote:

If you did the runstats requested with detail statistics (you did not say
above), and this is static SQL, then you need to rebind your package.
Yes, I did run the runstats.
You did not show the access plan of query 1, but it may be that it just
takes that long to return the 200,000 rows (5%).

Ahh, the returned rows are 8.5 million (that's about 500MB in file). But 10 minutes is still a bit long...
However I would use the Performance Wizard in the Control Center to tune the
database parameters, such as sort heaps, etc.


Thanks for the info. I'll try to figure around.

Bing

Nov 12 '05 #6

P: n/a
"Bing Wu" <bi**@biop.ox.ac.uk> wrote in message
news:bo**********@news.ox.ac.uk...

Ahh, the returned rows are 8.5 million (that's about 500MB in file). But 10 minutes is still a bit long...

Why not take the output file and copy it to another file to see how long it
takes for the operating system to create a file that large.
Nov 12 '05 #7

P: n/a
Thanks Mark. The copy I/O only took half minute to complete. I doubt there might be some parameters need to be tuned. Since I have done a MySQL implementation, the same SQL on the same datasets took only 3 minutes...

Bing

Mark A wrote:
"Bing Wu" <bi**@biop.ox.ac.uk> wrote in message
news:bo**********@news.ox.ac.uk...
Ahh, the returned rows are 8.5 million (that's about 500MB in file). But


10 minutes is still a bit long...

Why not take the output file and copy it to another file to see how long it
takes for the operating system to create a file that large.


Nov 12 '05 #8

This discussion thread is closed

Replies have been disabled for this discussion.