Will splitting up a table improve our select performance?

James Conrad St.John Foreman

[DB2 8.2.1 Workgroup on Suse 8.1 on a dual Xeon with 2.5Gb of ram, 6
discs at RAID5]

One of the larger tables in our database is now 6.8 million rows (1 per
financial transaction since 2000).

Every time an amendment is made to a booking, new rows are added to the
table for each transaction, so in general we never have any call to
update old rows. Usually, we only deal with analysis on transactions
in the current financial year or the previous one, but *occasionally*
we'll want to go back further.

So I'm thinking as follows:

Put every row with transaction_dat e earlier than 1 April 2004 into
transactions_to _end_mar04.
Put every row with a later transaction_dat e into transactions_cu rrent.
Put a clustering index on each table on transaction_dat e.
Create a view as follows:
CREATE VIEW transactions AS
SELECT * FROM transactions_to _end_mar04
UNION ALL
SELECT * FROM transactions_cu rrent;
My analysts will 99 times out of a hundred use the transaction_dat e
within the WHERE clause of any SQL they write against this view. Will
the clustering indexes be efficacious? Is the optimiser going to
realise it doesn't need to scan transactions_to _end_mar04 when my
analyst asks just for bookings from 1 April 05 onward?

We're trying this now and getting ready to put bits of SQL through the
explain, but I'm not sure if there's anything fundamental that I've
missed. So any comments/warnings/expressions of disdain would be
gratefully received.

Further to this, currently every table sits in the same SMS tablespace.
Would this set up be more efficient in multiple tablespaces, or would
that require DMS tablespaces? (DBA time and resource is costly to us
and we prefer not to use too much of it)

Thanks

Nov 12 '05 #1

Subscribe Reply

2677

bka

The simplest way to split the table would be to move to enterprise and
use DPF to hash the table across multiple nodes. If you want to pursue
the UNION ALL approach, see this:
http://www-128.ibm.com/developerwork...m-0202zuzarte/

Nov 12 '05 #2

Anton Versteeg

James Conrad St.John Foreman wrote:

[DB2 8.2.1 Workgroup on Suse 8.1 on a dual Xeon with 2.5Gb of ram, 6
discs at RAID5]

One of the larger tables in our database is now 6.8 million rows (1 per
financial transaction since 2000).

Every time an amendment is made to a booking, new rows are added to the
table for each transaction, so in general we never have any call to
update old rows. Usually, we only deal with analysis on transactions
in the current financial year or the previous one, but *occasionally*
we'll want to go back further.

So I'm thinking as follows:

Put every row with transaction_dat e earlier than 1 April 2004 into
transactions_to _end_mar04.
Put every row with a later transaction_dat e into transactions_cu rrent.
Put a clustering index on each table on transaction_dat e.
Create a view as follows:
CREATE VIEW transactions AS
SELECT * FROM transactions_to _end_mar04
UNION ALL
SELECT * FROM transactions_cu rrent;
My analysts will 99 times out of a hundred use the transaction_dat e
within the WHERE clause of any SQL they write against this view. Will
the clustering indexes be efficacious? Is the optimiser going to
realise it doesn't need to scan transactions_to _end_mar04 when my
analyst asks just for bookings from 1 April 05 onward?

We're trying this now and getting ready to put bits of SQL through the
explain, but I'm not sure if there's anything fundamental that I've
missed. So any comments/warnings/expressions of disdain would be
gratefully received.

Further to this, currently every table sits in the same SMS tablespace.
Would this set up be more efficient in multiple tablespaces, or would
that require DMS tablespaces? (DBA time and resource is costly to us
and we prefer not to use too much of it)

Thanks

Make sure you set INTRA_PARALLEL to YES in the Database Manager Config.
This should make best usage of your processors when processing the UNION
ALL.

Also DMS table spaces with multiple containers might be a better option.
You can have multiple containers (=directories) for SMS, but only when
you create the tablespace. With DMS you can add containers after it is
created.

--
Anton Versteeg
IBM Netherlands

Nov 12 '05 #3

Jurgen Haan

Well.. Thinking purely in SQL, and not partitioning, hardware, etc.
Perhaps you can make use of a clustered table, and use year as a
cluseter index. This way you'll get a 3D matrix. Don't know for sure if
this will increase overall performance, but as far as I can see it, it
will improve index scan performance. Don't know from which FP this
feature was available.

-R-

Nov 12 '05 #4

mirof007

James Conrad St.John Foreman wrote:

Is the optimiser going to
realise it doesn't need to scan transactions_to _end_mar04 when my
analyst asks just for bookings from 1 April 05 onward?

Short answer is yes, as long as proper check constraints are in place.

See this (somewhat dated, many improvements since) document for more
details on UNION ALL views processing in DB2:
http://www.ibm.com/developerworks/db...rte/index.html

Regards,
Miro.

Nov 12 '05 #5

James Conrad St.John Foreman

Thanks.

Now putting this through some testing. We've got some pretty nasty
queries to test with, and with the largest so far, the results are:

Single table, no clustered index: 3,083,170 timerons
Single table, clustered index on transaction date: 3,083,169 timerons
Two tables, both clustered on transaction date, view over the top so it
still looks to the end user like 1 table: 1,209,692 timerons

So if the optimiser is estimating accurately, we should see some good
time savings (task is taking 57 minutes under the current (single,
unclustered table) configuration.

One thing confuses me. We've set INTRA_PARALLEL = YES in the DBM
config. However, the access plan graph generated has Parallelism :
None. I find that a bit odd, because looking at the graph generated,
it has two large subqueries (one 1,151,000 timerons, and the other
56,000 timerons) that I would assume would benefit from being run in
parallel. Is this an unwarranted assumption? Or is it that because
there's such a large difference in the estimate for each of the two
subqueries that the optimiser is choosing not to run in parallel,
because there's unlikely to be benefit? Is parallelism going to be of
no use with current disc config (RAID5) or space management (SMS) ?

Nov 12 '05 #6

Pierre Saint-Jacques

Just a thought here. Wih INTRAS_PARALLEL ON did you also check: dbm cfg
MAX_QYERYDEGREE , should be at -1 or ANY or a value *=< no. of cou. Also
check at db cfg for DFT_DEGREE. It normally is defaulted to 1 and should be
set like MAX_QUERYDEGREE . If it is at default, then your access plan would
show no parallelism as the optimizer will always select the lesser of: SET
RUNTIME DEGREE, DFT_DEGREE,MAX_ QUERYDEGREE, in your case 1.
HTH, Pierre.

--
Pierre Saint-Jacques
SES Consultants Inc.
514-737-4515
"James Conrad St.John Foreman" <ja***********@ abcmail.co.uk> a écrit dans le
message de news: 11************* ********@g43g20 00...legro ups.com...

Thanks.

Now putting this through some testing. We've got some pretty nasty
queries to test with, and with the largest so far, the results are:

Single table, no clustered index: 3,083,170 timerons
Single table, clustered index on transaction date: 3,083,169 timerons
Two tables, both clustered on transaction date, view over the top so it
still looks to the end user like 1 table: 1,209,692 timerons

So if the optimiser is estimating accurately, we should see some good
time savings (task is taking 57 minutes under the current (single,
unclustered table) configuration.

One thing confuses me. We've set INTRA_PARALLEL = YES in the DBM
config. However, the access plan graph generated has Parallelism :
None. I find that a bit odd, because looking at the graph generated,
it has two large subqueries (one 1,151,000 timerons, and the other
56,000 timerons) that I would assume would benefit from being run in
parallel. Is this an unwarranted assumption? Or is it that because
there's such a large difference in the estimate for each of the two
subqueries that the optimiser is choosing not to run in parallel,
because there's unlikely to be benefit? Is parallelism going to be of
no use with current disc config (RAID5) or space management (SMS) ?

Nov 12 '05 #7

Buck Nuggets

If it's taking 57 minutes to scan 6.8 million rows in a normal table,
and estimated time drops to half with the union-all approach - I'd
suggest to keep on going if possible! :-)

Personally, I'm most fond of MDC - it eats up more space, and can slow
down loads (inserts too probably). But there's no maintenance
required, and if you can identify 1-2 key attributes typically used in
these selects then it can provide vast performance improvements. And
depending on the table, you might not have to make any changes to it -
so it can be quite a lot less work than DPF, union-all, etc.

Just as a reference point: I've got seven year old hardware running a
databases with 300 million rows returning queries in 1-2 seconds. This
is due to mdc, paralleism, and lots of disks.

buck

Nov 12 '05 #8

Ian

James Conrad St.John Foreman wrote:

Thanks.

Now putting this through some testing. We've got some pretty nasty
queries to test with, and with the largest so far, the results are:

Single table, no clustered index: 3,083,170 timerons
Single table, clustered index on transaction date: 3,083,169 timerons
Two tables, both clustered on transaction date, view over the top so it
still looks to the end user like 1 table: 1,209,692 timerons

So if the optimiser is estimating accurately, we should see some good
time savings (task is taking 57 minutes under the current (single,
unclustered table) configuration.

One thing confuses me. We've set INTRA_PARALLEL = YES in the DBM
config. However, the access plan graph generated has Parallelism :
None. I find that a bit odd, because looking at the graph generated,
it has two large subqueries (one 1,151,000 timerons, and the other
56,000 timerons) that I would assume would benefit from being run in
parallel. Is this an unwarranted assumption? Or is it that because
there's such a large difference in the estimate for each of the two
subqueries that the optimiser is choosing not to run in parallel,
because there's unlikely to be benefit? Is parallelism going to be of
no use with current disc config (RAID5) or space management (SMS) ?

Obviously without seeing your query plan this is a guess, but are you
seeing two legs of a nested loop join? (i.e. is the operator sitting
above the two legs NLJOIN?) This type of join usually won't take
advantage of intra-partition parallelism because of how the join works
(search for nested loop join in the documentation).
On Windows all of the EDUs are threads within the db2syscs.exe process.
These include the agents that handle queries, the prefetchers & page
cleaners that read and write to the disks, the log writer, etc. (On
Unix/Linux, these are each separate processse). These threads will
run concurrently regardless of whether intra-partition parallelism
(INTRA_PARALLEL ) has been enabled or not.

Nov 12 '05 #9

James Conrad St.John Foreman

Thanks: MAX_QUERYDEGREE was already at ANY, but our DFT_DEGREE was set
to 1.

Bit confused about RUNTIME DEGREE - is that in DB CFG or DBM CFG? (may
just be a problem with my sight whilst looking through all of these
parameters)

Nov 12 '05 #10

Similar topics

1765

PHP/MySQL - Splitting similar data to two tables and querying as a whole

by: larry | last post by:

Ok I am working on building my skills to convert my apps to LAMP (read I'm a semi noob), and there was one part I was thinking of. If I create two identical MySQL tables (we'll say, invoice and invoicearc) one will hold current period data and the other will hod out of period data - previous years stuff, which is only used in queries <5% of the time at most. Now can I join these two tables to make one table when doing queries that span...

PHP

3922

Which way will take more memory?

by: Rabbit63 | last post by:

Hi: I want to show a set of records in the database table on the clicnt browser. I have two ways to do this (writen in JScript): 1.The first way is: <% var sql = "select firstname from table1"; var obj=new ActiveXObject("ADODB.Recordset");

ASP / Active Server Pages

3001

How to improve deletes on a non partitioned table with no indexes

by: Subrahmanyam Arya | last post by:

Dear oracle gurus, I have created a non partitioned table with no indexes and stuffed in about 15 million rows. Using oracle 8.1.7.4 standard. I then deleted 1 million and it took 1 solid hour. How can i reduce the time for deletes? Since oracle default locking behavior is row level locking, i changed to lock the table in exclusive mode before deleting. This dint help.. Any clues,

Oracle Database

10815

Improve query performance on large table

by: Bing Wu | last post by:

Hi Folks, I have a very large table containing 170 million rows of coordinats: CREATE TABLE "DB2ADMIN"."COORDINATE" ( "FID" INTEGER NOT NULL , "AID" INTEGER NOT NULL , "X" REAL NOT NULL , "Y" REAL NOT NULL , "Z" REAL NOT NULL )

DB2 Database

6982

Join query on 1M row table slow

by: CSN | last post by:

I have a pretty simple select query that joins a table (p) with 125K rows with another table (pc) with almost one million rows: select p.* from product_categories pc inner join products p on pc.product_id = p.id where pc.category_id = $category_id order by p.title

PostgreSQL Database

4423

Oracle Query: Make Table works 5x faster than Append

by: RichardP via AccessMonster.com | last post by:

Hi there, I have a query running against Oracle which returns approx. 140,000 records. I need to store all this data locally in my BE database. Conventionally I would set up a table to contain this data and then run an Insert Into (append) query. Perhaps I might also experiment by toggling the UseTransaction property if performance was inadequate. When I run an append query to a linked table on a B/E database on my C drive,

Microsoft Access / VBA

1739

Opinion about design needed (splitting string data)

by: Matik | last post by:

Hi to everyone, My problem is, that I'm not so quite sure, which way should I go. The user is inputing by second part application a long string (let's say 128 characters), which are separated by semiclon. Example: A20;BU;AC40;MA50;E;E;IC;GREEN

Microsoft SQL Server

2658

Will a lookup table improve performance?

by: Paul H | last post by:

Suppose I have a table called tblPeople and I want a field to illustrate whether each person prefers cats or dogs. I could do it one of three ways. 1. A plain text field Create a text field in tblPeople called PreferredPet. 2. A lookup field that stores text values. Create a text field in tblPeople called PreferredPetID and use it to lookup an identical text field in tblPreferredPets.

Microsoft Access / VBA

2773

how to improve my java class performance while selecting and inserting

by: sdanda | last post by:

Hi , Do you have any idea how to improve my java class performance while selecting and inserting data into DB using JDBC Connectivity ......... This has to work for more than 8,00,000 of records ..... Can you give some performance tips if you have known 1) For this I am using oci driver ( because I m using oracle 10g) instead of thin driver 2) In that programme I m using prepared statement instead of statement 3) I am...

Java

9621

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...

General

9454

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...

Windows Server

10106

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...

Online Marketing

9914

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...

General

7463

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...

Microsoft Access / VBA

6717

Couldn’t get equations in html when convert word .docx file to html file in C#.

by: conductexam | last post by:

I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...

C# / C Sharp

5355

Trying to create a lan-to-lan vpn between two differents networks

by: TSSRALBI | last post by:

Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...

Networking - Hardware / Configuration

5484

Windows Forms - .Net 8.0

by: adsilva | last post by:

A Windows Forms form does not have the event Unload, like VB6. What one acts like?

Visual Basic .NET

3610

How to add payments to a PHP MySQL app.

by: muto222 | last post by:

How can i add a mobile payment intergratation into php mysql website.

PHP