473,773 Members | 2,365 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Will splitting up a table improve our select performance?

[DB2 8.2.1 Workgroup on Suse 8.1 on a dual Xeon with 2.5Gb of ram, 6
discs at RAID5]

One of the larger tables in our database is now 6.8 million rows (1 per
financial transaction since 2000).

Every time an amendment is made to a booking, new rows are added to the
table for each transaction, so in general we never have any call to
update old rows. Usually, we only deal with analysis on transactions
in the current financial year or the previous one, but *occasionally*
we'll want to go back further.

So I'm thinking as follows:

Put every row with transaction_dat e earlier than 1 April 2004 into
transactions_to _end_mar04.
Put every row with a later transaction_dat e into transactions_cu rrent.
Put a clustering index on each table on transaction_dat e.
Create a view as follows:
CREATE VIEW transactions AS
SELECT * FROM transactions_to _end_mar04
UNION ALL
SELECT * FROM transactions_cu rrent;
My analysts will 99 times out of a hundred use the transaction_dat e
within the WHERE clause of any SQL they write against this view. Will
the clustering indexes be efficacious? Is the optimiser going to
realise it doesn't need to scan transactions_to _end_mar04 when my
analyst asks just for bookings from 1 April 05 onward?

We're trying this now and getting ready to put bits of SQL through the
explain, but I'm not sure if there's anything fundamental that I've
missed. So any comments/warnings/expressions of disdain would be
gratefully received.

Further to this, currently every table sits in the same SMS tablespace.
Would this set up be more efficient in multiple tablespaces, or would
that require DMS tablespaces? (DBA time and resource is costly to us
and we prefer not to use too much of it)

Thanks

Nov 12 '05 #1
13 2677
bka
The simplest way to split the table would be to move to enterprise and
use DPF to hash the table across multiple nodes. If you want to pursue
the UNION ALL approach, see this:
http://www-128.ibm.com/developerwork...m-0202zuzarte/

Nov 12 '05 #2
James Conrad St.John Foreman wrote:
[DB2 8.2.1 Workgroup on Suse 8.1 on a dual Xeon with 2.5Gb of ram, 6
discs at RAID5]

One of the larger tables in our database is now 6.8 million rows (1 per
financial transaction since 2000).

Every time an amendment is made to a booking, new rows are added to the
table for each transaction, so in general we never have any call to
update old rows. Usually, we only deal with analysis on transactions
in the current financial year or the previous one, but *occasionally*
we'll want to go back further.

So I'm thinking as follows:

Put every row with transaction_dat e earlier than 1 April 2004 into
transactions_to _end_mar04.
Put every row with a later transaction_dat e into transactions_cu rrent.
Put a clustering index on each table on transaction_dat e.
Create a view as follows:
CREATE VIEW transactions AS
SELECT * FROM transactions_to _end_mar04
UNION ALL
SELECT * FROM transactions_cu rrent;
My analysts will 99 times out of a hundred use the transaction_dat e
within the WHERE clause of any SQL they write against this view. Will
the clustering indexes be efficacious? Is the optimiser going to
realise it doesn't need to scan transactions_to _end_mar04 when my
analyst asks just for bookings from 1 April 05 onward?

We're trying this now and getting ready to put bits of SQL through the
explain, but I'm not sure if there's anything fundamental that I've
missed. So any comments/warnings/expressions of disdain would be
gratefully received.

Further to this, currently every table sits in the same SMS tablespace.
Would this set up be more efficient in multiple tablespaces, or would
that require DMS tablespaces? (DBA time and resource is costly to us
and we prefer not to use too much of it)

Thanks

Make sure you set INTRA_PARALLEL to YES in the Database Manager Config.
This should make best usage of your processors when processing the UNION
ALL.

Also DMS table spaces with multiple containers might be a better option.
You can have multiple containers (=directories) for SMS, but only when
you create the tablespace. With DMS you can add containers after it is
created.

--
Anton Versteeg
IBM Netherlands
Nov 12 '05 #3
Well.. Thinking purely in SQL, and not partitioning, hardware, etc.
Perhaps you can make use of a clustered table, and use year as a
cluseter index. This way you'll get a 3D matrix. Don't know for sure if
this will increase overall performance, but as far as I can see it, it
will improve index scan performance. Don't know from which FP this
feature was available.

-R-
Nov 12 '05 #4

James Conrad St.John Foreman wrote:
Is the optimiser going to
realise it doesn't need to scan transactions_to _end_mar04 when my
analyst asks just for bookings from 1 April 05 onward?


Short answer is yes, as long as proper check constraints are in place.

See this (somewhat dated, many improvements since) document for more
details on UNION ALL views processing in DB2:
http://www.ibm.com/developerworks/db...rte/index.html

Regards,
Miro.

Nov 12 '05 #5
Thanks.

Now putting this through some testing. We've got some pretty nasty
queries to test with, and with the largest so far, the results are:

Single table, no clustered index: 3,083,170 timerons
Single table, clustered index on transaction date: 3,083,169 timerons
Two tables, both clustered on transaction date, view over the top so it
still looks to the end user like 1 table: 1,209,692 timerons

So if the optimiser is estimating accurately, we should see some good
time savings (task is taking 57 minutes under the current (single,
unclustered table) configuration.

One thing confuses me. We've set INTRA_PARALLEL = YES in the DBM
config. However, the access plan graph generated has Parallelism :
None. I find that a bit odd, because looking at the graph generated,
it has two large subqueries (one 1,151,000 timerons, and the other
56,000 timerons) that I would assume would benefit from being run in
parallel. Is this an unwarranted assumption? Or is it that because
there's such a large difference in the estimate for each of the two
subqueries that the optimiser is choosing not to run in parallel,
because there's unlikely to be benefit? Is parallelism going to be of
no use with current disc config (RAID5) or space management (SMS) ?

Nov 12 '05 #6
Just a thought here. Wih INTRAS_PARALLEL ON did you also check: dbm cfg
MAX_QYERYDEGREE , should be at -1 or ANY or a value *=< no. of cou. Also
check at db cfg for DFT_DEGREE. It normally is defaulted to 1 and should be
set like MAX_QUERYDEGREE . If it is at default, then your access plan would
show no parallelism as the optimizer will always select the lesser of: SET
RUNTIME DEGREE, DFT_DEGREE,MAX_ QUERYDEGREE, in your case 1.
HTH, Pierre.

--
Pierre Saint-Jacques
SES Consultants Inc.
514-737-4515
"James Conrad St.John Foreman" <ja***********@ abcmail.co.uk> a écrit dans le
message de news: 11************* ********@g43g20 00...legro ups.com...
Thanks.

Now putting this through some testing. We've got some pretty nasty
queries to test with, and with the largest so far, the results are:

Single table, no clustered index: 3,083,170 timerons
Single table, clustered index on transaction date: 3,083,169 timerons
Two tables, both clustered on transaction date, view over the top so it
still looks to the end user like 1 table: 1,209,692 timerons

So if the optimiser is estimating accurately, we should see some good
time savings (task is taking 57 minutes under the current (single,
unclustered table) configuration.

One thing confuses me. We've set INTRA_PARALLEL = YES in the DBM
config. However, the access plan graph generated has Parallelism :
None. I find that a bit odd, because looking at the graph generated,
it has two large subqueries (one 1,151,000 timerons, and the other
56,000 timerons) that I would assume would benefit from being run in
parallel. Is this an unwarranted assumption? Or is it that because
there's such a large difference in the estimate for each of the two
subqueries that the optimiser is choosing not to run in parallel,
because there's unlikely to be benefit? Is parallelism going to be of
no use with current disc config (RAID5) or space management (SMS) ?


Nov 12 '05 #7
If it's taking 57 minutes to scan 6.8 million rows in a normal table,
and estimated time drops to half with the union-all approach - I'd
suggest to keep on going if possible! :-)

Personally, I'm most fond of MDC - it eats up more space, and can slow
down loads (inserts too probably). But there's no maintenance
required, and if you can identify 1-2 key attributes typically used in
these selects then it can provide vast performance improvements. And
depending on the table, you might not have to make any changes to it -
so it can be quite a lot less work than DPF, union-all, etc.

Just as a reference point: I've got seven year old hardware running a
databases with 300 million rows returning queries in 1-2 seconds. This
is due to mdc, paralleism, and lots of disks.

buck

Nov 12 '05 #8
Ian
James Conrad St.John Foreman wrote:
Thanks.

Now putting this through some testing. We've got some pretty nasty
queries to test with, and with the largest so far, the results are:

Single table, no clustered index: 3,083,170 timerons
Single table, clustered index on transaction date: 3,083,169 timerons
Two tables, both clustered on transaction date, view over the top so it
still looks to the end user like 1 table: 1,209,692 timerons

So if the optimiser is estimating accurately, we should see some good
time savings (task is taking 57 minutes under the current (single,
unclustered table) configuration.

One thing confuses me. We've set INTRA_PARALLEL = YES in the DBM
config. However, the access plan graph generated has Parallelism :
None. I find that a bit odd, because looking at the graph generated,
it has two large subqueries (one 1,151,000 timerons, and the other
56,000 timerons) that I would assume would benefit from being run in
parallel. Is this an unwarranted assumption? Or is it that because
there's such a large difference in the estimate for each of the two
subqueries that the optimiser is choosing not to run in parallel,
because there's unlikely to be benefit? Is parallelism going to be of
no use with current disc config (RAID5) or space management (SMS) ?


Obviously without seeing your query plan this is a guess, but are you
seeing two legs of a nested loop join? (i.e. is the operator sitting
above the two legs NLJOIN?) This type of join usually won't take
advantage of intra-partition parallelism because of how the join works
(search for nested loop join in the documentation).
On Windows all of the EDUs are threads within the db2syscs.exe process.
These include the agents that handle queries, the prefetchers & page
cleaners that read and write to the disks, the log writer, etc. (On
Unix/Linux, these are each separate processse). These threads will
run concurrently regardless of whether intra-partition parallelism
(INTRA_PARALLEL ) has been enabled or not.
Nov 12 '05 #9
Thanks: MAX_QUERYDEGREE was already at ANY, but our DFT_DEGREE was set
to 1.

Bit confused about RUNTIME DEGREE - is that in DB CFG or DBM CFG? (may
just be a problem with my sight whilst looking through all of these
parameters)

Nov 12 '05 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

5
1765
by: larry | last post by:
Ok I am working on building my skills to convert my apps to LAMP (read I'm a semi noob), and there was one part I was thinking of. If I create two identical MySQL tables (we'll say, invoice and invoicearc) one will hold current period data and the other will hod out of period data - previous years stuff, which is only used in queries <5% of the time at most. Now can I join these two tables to make one table when doing queries that span...
21
3922
by: Rabbit63 | last post by:
Hi: I want to show a set of records in the database table on the clicnt browser. I have two ways to do this (writen in JScript): 1.The first way is: <% var sql = "select firstname from table1"; var obj=new ActiveXObject("ADODB.Recordset");
1
3001
by: Subrahmanyam Arya | last post by:
Dear oracle gurus, I have created a non partitioned table with no indexes and stuffed in about 15 million rows. Using oracle 8.1.7.4 standard. I then deleted 1 million and it took 1 solid hour. How can i reduce the time for deletes? Since oracle default locking behavior is row level locking, i changed to lock the table in exclusive mode before deleting. This dint help.. Any clues,
7
10815
by: Bing Wu | last post by:
Hi Folks, I have a very large table containing 170 million rows of coordinats: CREATE TABLE "DB2ADMIN"."COORDINATE" ( "FID" INTEGER NOT NULL , "AID" INTEGER NOT NULL , "X" REAL NOT NULL , "Y" REAL NOT NULL , "Z" REAL NOT NULL )
21
6982
by: CSN | last post by:
I have a pretty simple select query that joins a table (p) with 125K rows with another table (pc) with almost one million rows: select p.* from product_categories pc inner join products p on pc.product_id = p.id where pc.category_id = $category_id order by p.title
2
4423
by: RichardP via AccessMonster.com | last post by:
Hi there, I have a query running against Oracle which returns approx. 140,000 records. I need to store all this data locally in my BE database. Conventionally I would set up a table to contain this data and then run an Insert Into (append) query. Perhaps I might also experiment by toggling the UseTransaction property if performance was inadequate. When I run an append query to a linked table on a B/E database on my C drive,
7
1739
by: Matik | last post by:
Hi to everyone, My problem is, that I'm not so quite sure, which way should I go. The user is inputing by second part application a long string (let's say 128 characters), which are separated by semiclon. Example: A20;BU;AC40;MA50;E;E;IC;GREEN
11
2658
by: Paul H | last post by:
Suppose I have a table called tblPeople and I want a field to illustrate whether each person prefers cats or dogs. I could do it one of three ways. 1. A plain text field Create a text field in tblPeople called PreferredPet. 2. A lookup field that stores text values. Create a text field in tblPeople called PreferredPetID and use it to lookup an identical text field in tblPreferredPets.
2
2773
by: sdanda | last post by:
Hi , Do you have any idea how to improve my java class performance while selecting and inserting data into DB using JDBC Connectivity ......... This has to work for more than 8,00,000 of records ..... Can you give some performance tips if you have known 1) For this I am using oci driver ( because I m using oracle 10g) instead of thin driver 2) In that programme I m using prepared statement instead of statement 3) I am...
0
9621
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
9454
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10106
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
0
9914
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
1
7463
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
6717
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5355
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
5484
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
2
3610
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.