[DB2 8.2.1 Workgroup on Suse 8.1 on a dual Xeon with 2.5Gb of ram, 6
discs at RAID5]
One of the larger tables in our database is now 6.8 million rows (1 per
financial transaction since 2000).
Every time an amendment is made to a booking, new rows are added to the
table for each transaction, so in general we never have any call to
update old rows. Usually, we only deal with analysis on transactions
in the current financial year or the previous one, but *occasionally*
we'll want to go back further.
So I'm thinking as follows:
Put every row with transaction_dat e earlier than 1 April 2004 into
transactions_to _end_mar04.
Put every row with a later transaction_dat e into transactions_cu rrent.
Put a clustering index on each table on transaction_dat e.
Create a view as follows:
CREATE VIEW transactions AS
SELECT * FROM transactions_to _end_mar04
UNION ALL
SELECT * FROM transactions_cu rrent;
My analysts will 99 times out of a hundred use the transaction_dat e
within the WHERE clause of any SQL they write against this view. Will
the clustering indexes be efficacious? Is the optimiser going to
realise it doesn't need to scan transactions_to _end_mar04 when my
analyst asks just for bookings from 1 April 05 onward?
We're trying this now and getting ready to put bits of SQL through the
explain, but I'm not sure if there's anything fundamental that I've
missed. So any comments/warnings/expressions of disdain would be
gratefully received.
Further to this, currently every table sits in the same SMS tablespace.
Would this set up be more efficient in multiple tablespaces, or would
that require DMS tablespaces? (DBA time and resource is costly to us
and we prefer not to use too much of it)
Thanks 13 2677
James Conrad St.John Foreman wrote: [DB2 8.2.1 Workgroup on Suse 8.1 on a dual Xeon with 2.5Gb of ram, 6 discs at RAID5]
One of the larger tables in our database is now 6.8 million rows (1 per financial transaction since 2000).
Every time an amendment is made to a booking, new rows are added to the table for each transaction, so in general we never have any call to update old rows. Usually, we only deal with analysis on transactions in the current financial year or the previous one, but *occasionally* we'll want to go back further.
So I'm thinking as follows:
Put every row with transaction_dat e earlier than 1 April 2004 into transactions_to _end_mar04. Put every row with a later transaction_dat e into transactions_cu rrent. Put a clustering index on each table on transaction_dat e. Create a view as follows: CREATE VIEW transactions AS SELECT * FROM transactions_to _end_mar04 UNION ALL SELECT * FROM transactions_cu rrent; My analysts will 99 times out of a hundred use the transaction_dat e within the WHERE clause of any SQL they write against this view. Will the clustering indexes be efficacious? Is the optimiser going to realise it doesn't need to scan transactions_to _end_mar04 when my analyst asks just for bookings from 1 April 05 onward?
We're trying this now and getting ready to put bits of SQL through the explain, but I'm not sure if there's anything fundamental that I've missed. So any comments/warnings/expressions of disdain would be gratefully received.
Further to this, currently every table sits in the same SMS tablespace. Would this set up be more efficient in multiple tablespaces, or would that require DMS tablespaces? (DBA time and resource is costly to us and we prefer not to use too much of it)
Thanks
Make sure you set INTRA_PARALLEL to YES in the Database Manager Config.
This should make best usage of your processors when processing the UNION
ALL.
Also DMS table spaces with multiple containers might be a better option.
You can have multiple containers (=directories) for SMS, but only when
you create the tablespace. With DMS you can add containers after it is
created.
--
Anton Versteeg
IBM Netherlands
Well.. Thinking purely in SQL, and not partitioning, hardware, etc.
Perhaps you can make use of a clustered table, and use year as a
cluseter index. This way you'll get a 3D matrix. Don't know for sure if
this will increase overall performance, but as far as I can see it, it
will improve index scan performance. Don't know from which FP this
feature was available.
-R-
James Conrad St.John Foreman wrote: Is the optimiser going to realise it doesn't need to scan transactions_to _end_mar04 when my analyst asks just for bookings from 1 April 05 onward?
Short answer is yes, as long as proper check constraints are in place.
See this (somewhat dated, many improvements since) document for more
details on UNION ALL views processing in DB2: http://www.ibm.com/developerworks/db...rte/index.html
Regards,
Miro.
Thanks.
Now putting this through some testing. We've got some pretty nasty
queries to test with, and with the largest so far, the results are:
Single table, no clustered index: 3,083,170 timerons
Single table, clustered index on transaction date: 3,083,169 timerons
Two tables, both clustered on transaction date, view over the top so it
still looks to the end user like 1 table: 1,209,692 timerons
So if the optimiser is estimating accurately, we should see some good
time savings (task is taking 57 minutes under the current (single,
unclustered table) configuration.
One thing confuses me. We've set INTRA_PARALLEL = YES in the DBM
config. However, the access plan graph generated has Parallelism :
None. I find that a bit odd, because looking at the graph generated,
it has two large subqueries (one 1,151,000 timerons, and the other
56,000 timerons) that I would assume would benefit from being run in
parallel. Is this an unwarranted assumption? Or is it that because
there's such a large difference in the estimate for each of the two
subqueries that the optimiser is choosing not to run in parallel,
because there's unlikely to be benefit? Is parallelism going to be of
no use with current disc config (RAID5) or space management (SMS) ?
Just a thought here. Wih INTRAS_PARALLEL ON did you also check: dbm cfg
MAX_QYERYDEGREE , should be at -1 or ANY or a value *=< no. of cou. Also
check at db cfg for DFT_DEGREE. It normally is defaulted to 1 and should be
set like MAX_QUERYDEGREE . If it is at default, then your access plan would
show no parallelism as the optimizer will always select the lesser of: SET
RUNTIME DEGREE, DFT_DEGREE,MAX_ QUERYDEGREE, in your case 1.
HTH, Pierre.
--
Pierre Saint-Jacques
SES Consultants Inc.
514-737-4515
"James Conrad St.John Foreman" <ja***********@ abcmail.co.uk> a écrit dans le
message de news: 11************* ********@g43g20 00...legro ups.com... Thanks.
Now putting this through some testing. We've got some pretty nasty queries to test with, and with the largest so far, the results are:
Single table, no clustered index: 3,083,170 timerons Single table, clustered index on transaction date: 3,083,169 timerons Two tables, both clustered on transaction date, view over the top so it still looks to the end user like 1 table: 1,209,692 timerons
So if the optimiser is estimating accurately, we should see some good time savings (task is taking 57 minutes under the current (single, unclustered table) configuration.
One thing confuses me. We've set INTRA_PARALLEL = YES in the DBM config. However, the access plan graph generated has Parallelism : None. I find that a bit odd, because looking at the graph generated, it has two large subqueries (one 1,151,000 timerons, and the other 56,000 timerons) that I would assume would benefit from being run in parallel. Is this an unwarranted assumption? Or is it that because there's such a large difference in the estimate for each of the two subqueries that the optimiser is choosing not to run in parallel, because there's unlikely to be benefit? Is parallelism going to be of no use with current disc config (RAID5) or space management (SMS) ?
If it's taking 57 minutes to scan 6.8 million rows in a normal table,
and estimated time drops to half with the union-all approach - I'd
suggest to keep on going if possible! :-)
Personally, I'm most fond of MDC - it eats up more space, and can slow
down loads (inserts too probably). But there's no maintenance
required, and if you can identify 1-2 key attributes typically used in
these selects then it can provide vast performance improvements. And
depending on the table, you might not have to make any changes to it -
so it can be quite a lot less work than DPF, union-all, etc.
Just as a reference point: I've got seven year old hardware running a
databases with 300 million rows returning queries in 1-2 seconds. This
is due to mdc, paralleism, and lots of disks.
buck
James Conrad St.John Foreman wrote: Thanks.
Now putting this through some testing. We've got some pretty nasty queries to test with, and with the largest so far, the results are:
Single table, no clustered index: 3,083,170 timerons Single table, clustered index on transaction date: 3,083,169 timerons Two tables, both clustered on transaction date, view over the top so it still looks to the end user like 1 table: 1,209,692 timerons
So if the optimiser is estimating accurately, we should see some good time savings (task is taking 57 minutes under the current (single, unclustered table) configuration.
One thing confuses me. We've set INTRA_PARALLEL = YES in the DBM config. However, the access plan graph generated has Parallelism : None. I find that a bit odd, because looking at the graph generated, it has two large subqueries (one 1,151,000 timerons, and the other 56,000 timerons) that I would assume would benefit from being run in parallel. Is this an unwarranted assumption? Or is it that because there's such a large difference in the estimate for each of the two subqueries that the optimiser is choosing not to run in parallel, because there's unlikely to be benefit? Is parallelism going to be of no use with current disc config (RAID5) or space management (SMS) ?
Obviously without seeing your query plan this is a guess, but are you
seeing two legs of a nested loop join? (i.e. is the operator sitting
above the two legs NLJOIN?) This type of join usually won't take
advantage of intra-partition parallelism because of how the join works
(search for nested loop join in the documentation).
On Windows all of the EDUs are threads within the db2syscs.exe process.
These include the agents that handle queries, the prefetchers & page
cleaners that read and write to the disks, the log writer, etc. (On
Unix/Linux, these are each separate processse). These threads will
run concurrently regardless of whether intra-partition parallelism
(INTRA_PARALLEL ) has been enabled or not.
Thanks: MAX_QUERYDEGREE was already at ANY, but our DFT_DEGREE was set
to 1.
Bit confused about RUNTIME DEGREE - is that in DB CFG or DBM CFG? (may
just be a problem with my sight whilst looking through all of these
parameters) This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics |
by: larry |
last post by:
Ok I am working on building my skills to convert my apps to LAMP (read
I'm a semi noob), and there was one part I was thinking of. If I
create two identical MySQL tables (we'll say, invoice and invoicearc)
one will hold current period data and the other will hod out of period
data - previous years stuff, which is only used in queries <5% of the
time at most.
Now can I join these two tables to make one table when doing queries
that span...
|
by: Rabbit63 |
last post by:
Hi:
I want to show a set of records in the database table on the clicnt browser.
I have two ways to do this (writen in JScript):
1.The first way is:
<%
var sql = "select firstname from table1";
var obj=new ActiveXObject("ADODB.Recordset");
|
by: Subrahmanyam Arya |
last post by:
Dear oracle gurus,
I have created a non partitioned table with no indexes and stuffed in
about 15 million rows. Using oracle 8.1.7.4 standard. I then deleted 1
million and it took 1 solid hour. How can i reduce the time for
deletes? Since oracle default locking behavior is row level locking, i
changed to lock the table in exclusive mode before deleting. This dint
help..
Any clues,
|
by: Bing Wu |
last post by:
Hi Folks,
I have a very large table containing 170 million rows of coordinats:
CREATE TABLE "DB2ADMIN"."COORDINATE" (
"FID" INTEGER NOT NULL ,
"AID" INTEGER NOT NULL ,
"X" REAL NOT NULL ,
"Y" REAL NOT NULL ,
"Z" REAL NOT NULL )
|
by: CSN |
last post by:
I have a pretty simple select query that joins a table
(p) with 125K rows with another table (pc) with almost
one million rows:
select p.*
from product_categories pc
inner join products p
on pc.product_id = p.id
where pc.category_id = $category_id
order by p.title
| |
by: RichardP via AccessMonster.com |
last post by:
Hi there,
I have a query running against Oracle which returns approx. 140,000 records.
I need to store all this data locally in my BE database. Conventionally I
would set up a table to contain this data and then run an Insert Into (append)
query. Perhaps I might also experiment by toggling the UseTransaction
property if performance was inadequate.
When I run an append query to a linked table on a B/E database on my C drive,
|
by: Matik |
last post by:
Hi to everyone,
My problem is, that I'm not so quite sure, which way should I go.
The user is inputing by second part application a long string (let's
say 128 characters), which are separated by semiclon.
Example:
A20;BU;AC40;MA50;E;E;IC;GREEN
|
by: Paul H |
last post by:
Suppose I have a table called tblPeople and I want a field to illustrate
whether each person prefers cats or dogs. I could do it one of three ways.
1. A plain text field
Create a text field in tblPeople called PreferredPet.
2. A lookup field that stores text values.
Create a text field in tblPeople called PreferredPetID and use it to lookup
an identical text field in tblPreferredPets.
|
by: sdanda |
last post by:
Hi ,
Do you have any idea how to improve my java class performance while selecting and inserting data into DB using JDBC Connectivity .........
This has to work for more than 8,00,000 of records ..... Can you give some performance tips if you have known
1) For this I am using oci driver ( because I m using oracle 10g) instead of thin driver
2) In that programme I m using prepared statement instead of statement
3) I am...
|
by: marktang |
last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look !
Part I. Meaning of...
|
by: Hystou |
last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it.
First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
| |
by: jinu1996 |
last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth.
The Art of Business Website Design
Your website is...
|
by: tracyyun |
last post by:
Dear forum friends,
With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
|
by: isladogs |
last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM).
In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules.
He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms.
Adolph will...
|
by: conductexam |
last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one.
At the time of converting from word file to html my equations which are in the word document file was convert into image.
Globals.ThisAddIn.Application.ActiveDocument.Select();...
|
by: TSSRALBI |
last post by:
Hello
I'm a network technician in training and I need your help.
I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs.
The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols.
I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
|
by: adsilva |
last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
| |
by: muto222 |
last post by:
How can i add a mobile payment intergratation into php mysql website.
| |